# Component monitoring

All KubeVirt system-components expose Prometheus metrics at their `/metrics` REST endpoint.

## Custom Service Discovery

Prometheus supports service discovery based on [Pods](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#pod) and [Endpoints](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#endpoints) out of the box. Both can be used to discover KubeVirt services.

All Pods which expose metrics are labeled with `prometheus.kubevirt.io` and contain a port-definition which is called `metrics`. In the KubeVirt release-manifests, the default `metrics` port is `8443`.

The above labels and port informations are collected by a `Service` called `kubevirt-prometheus-metrics`. Kubernetes automatically creates a corresponding `Endpoint` with an equal name:

```
$ kubectl get endpoints -n kubevirt kubevirt-prometheus-metrics -o yaml
apiVersion: v1
kind: Endpoints
metadata:
  labels:
    kubevirt.io: ""
    prometheus.kubevirt.io: ""
  name: kubevirt-prometheus-metrics
  namespace: kubevirt
subsets:
- addresses:
  - ip: 10.244.0.5
    nodeName: node01
    targetRef:
      kind: Pod
      name: virt-handler-cjzg6
      namespace: kubevirt
      resourceVersion: "4891"
      uid: c67331f9-bfcf-11e8-bc54-525500d15501
  - ip: 10.244.0.6
  [...]
  ports:
  - name: metrics
    port: 8443
    protocol: TCP
```

By watching this endpoint for added and removed IPs to `subsets.addresses` and appending the `metrics` port from `subsets.ports`, it is possible to always get a complete list of ready-to-be-scraped Prometheus targets.

## Integrating with the prometheus-operator

The [prometheus-operator](https://github.com/coreos/prometheus-operator) can make use of the `kubevirt-prometheus-metrics` service to automatically create the appropriate Prometheus config.

KubeVirt's `virt-operator` checks if the `ServiceMonitor` custom resource exists when creating an install strategy for deployment. KubeVirt will automatically create a `ServiceMonitor` resource in the `monitorNamespace`, as well as an appropriate role and rolebinding in KubeVirt's namespace.

Two settings are exposed in the `KubeVirt` custom resource to direct KubeVirt to create these resources correctly:

* `monitorNamespace`: The namespace that prometheus-operator runs in. Defaults to `openshift-monitoring`.
* `monitorAccount`: The serviceAccount that prometheus-operator runs with. Defaults to `prometheus-k8s`.

If the prometheus-operator for a given deployment uses these defaults, then these values can be omitted.

An example of the KubeVirt resource depicting these default values:

```
apiVersion: kubevirt.io/v1alpha3
kind: KubeVirt
metadata:
  name: kubevirt
spec:
  monitorNamespace: openshift-monitoring
  monitorAccount: prometheus-k8s
```

## Integrating with the OKD cluster-monitoring-operator

After the [cluster-monitoring-operator](https://github.com/openshift/cluster-monitoring-operator) is up and running, KubeVirt will detect the existence of the `ServiceMonitor` resource. Because the definition contains the `openshift.io/cluster-monitoring` label, it will automatically be picked up by the cluster monitor.

## Metrics about Virtual Machines

The endpoints report metrics related to the runtime behaviour of the Virtual Machines. All the relevant metrics are prefixed with `kubevirt_vmi`.

The metrics have labels that allow to connect to the VMI objects they refer to. At minimum, the labels will expose `node`, `name` and `namespace` of the related VMI object.

For example, reported metrics could look like

```
kubevirt_vmi_memory_resident_bytes{domain="default_vm-test-01",name="vm-test-01",namespace="default",node="node01"} 2.5595904e+07
kubevirt_vmi_network_traffic_bytes_total{domain="default_vm-test-01",interface="vnet0",name="vm-test-01",namespace="default",node="node01",type="rx"} 8431
kubevirt_vmi_network_traffic_bytes_total{domain="default_vm-test-01",interface="vnet0",name="vm-test-01",namespace="default",node="node01",type="tx"} 1835
kubevirt_vmi_vcpu_seconds{domain="default_vm-test-01",id="0",name="vm-test-01",namespace="default",node="node01",state="1"} 19
```

Please note the `domain` label in the above example. This label is deprecated and it will be removed in a future release. You should identify the VMI using the `node`, `namespace`, `name` labels instead.

## Important Queries

### Detecting connection issues for the REST client

Use the following query to get a counter for all REST call which indicate connection issues:

```
rest_client_requests_total{code="<error>"}
```

If this counter is continuously increasing, it is an indicator that the corresponding KubeVirt component has general issues to connect to the apiserver


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://kubevirtlegacy.gitbook.io/user-guide/docs/operations/component_monitoring.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
