# Prometheus

## Prerequisites: <a href="#prerequisites" id="prerequisites"></a>

This section explains on how to integrate and ingest alerts from Prometheus monitoring tool into CloudFabrix AIOPs platform.

Prometheus Alertmanager is alert management component which supports alert notifications via email, slack, webhook and others. CloudFabrix AIOPs platform uses webhook notification method over HTTP protocol to receive and ingest the alerts or events.

Click here for [Alert Sources](/features-guide/alert-watch/alert-sources.md) to create a Webhook URL for Prometheus alert notifications in CloudFabrix OIA application.

### Prometheus Alert Rules Configuration:

Below is the sample configuration to define Alert threshold rules to trigger alerts for monitored assets with alert rules configuration file. (Note: Below alert trigger rules for reference only)

```
groups:
- name: ALERTENGINE
  rules:
  - alert: ALERT_MANAGER_FAILURES
    expr: rate(alertmanager_notifications_failed_total[5m]) > 0
    labels:
      severity: CRITICAL
      category: ALERTING
    annotations:
      title: Alertmanager is failing to send notications
      description: Alertmanager is seeing errors {{$labels.integration}}

- name: CATASTROPHIC
  rules:
  - alert: HOST_DOWN
    expr: avg_over_time(up{job=~"Hosts|Containers"}[2m]) == 0
    labels:
      severity: CRITICAL
      category: AVAILABILITY
    annotations:
      summary: "{{$labels.instance}}: Host is unreachable. Host could be down. The Collecors are not accessible. If the host is up, make sure collectors are running."
      description: "{{$labels.instance}}: Host is unreachable. Host could be down. The Collecors are not accessible. If the host is up, make sure collectors are running."

- name: HOST
  rules:
  - alert: HOST_HIGH_MEMORY_USAGE
    expr: (((avg_over_time(node_memory_MemTotal_bytes[5m]) - avg_over_time(node_memory_MemFree_bytes[5m]) - avg_over_time(node_memory_Cached_bytes[5m])) / (avg_over_time(node_memory_MemTotal_bytes[5m])) * 100)) > 80
    labels:
      severity: HIGH
      category: HOST_MEMORY
    annotations:
      summary: "{{$labels.instance}}: Memory Usage detected above 80"
      description: "{{$labels.instance}}: Memory usage usage is above 80% (Current Used Memory % is: {{ $value }})"

  - alert: HOST_HIGH_DISK_USAGE
    expr: ((avg_over_time(node_filesystem_size_bytes{fstype=~"(ext.|xfs)"}[5m]) - avg_over_time(node_filesystem_free_bytes{fstype=~"(ext.|xfs)"}[5m])) * 100 / avg_over_time(node_filesystem_size_bytes{fstype=~"(ext.|xfs)"}[5m])) > 70
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk {{$labels.device}} Usage detected above 70"
      description: "{{$labels.instance}}: Disk  {{$labels.device}} usage usage is above 70% (Current Disk Used % is: {{ $value }})"

  - alert: HOST_HIGH_CPU_USAGE
    expr: (100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 70
    labels:
      severity: HIGH
      category: HOST_CPU
    annotations:
      summary: "{{$labels.instance}}: CPU Usage detected above 70"
      description: "{{$labels.instance}}: CPU usage usage is above 70% (Current CPU % is: {{ $value }})"

  - alert: HOST_HIGH_DISK_UTILIZATION
    expr: rate(node_disk_io_time_seconds_total[5m]) / 10 > 90
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) utilization is very high."
      description: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) utilization is very high. (Current Utilization is: {{ $value }})"


  - alert: HOST_HIGH_DISK_INODE
    expr: avg_over_time(node_filesystem_files_free{fstype=~"(ext.|xfs)"}[5m]) / avg_over_time(node_filesystem_files{fstype=~"(ext.|xfs)"}[5m]) * 100 <= 20
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) High number of inode usage"
      description: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) High number of inode usage. (Current value is: {{ $value }})"

```

### Prometheus Alertmanager Configuration for Alert Notifications:

Below is the sample configuration for Prometheus alertmanager to send alert notifications to CloudFabrix AIOps platform over Webhook URL. (config.yml)

```
route:
  repeat_interval: 1m
  receiver: cfx-webhook

receivers:
- name: cfx-webhook
  webhook_configs:
  - url: 'https://<cfx-aiops-webhook-URL>'
    send_resolved: true
    http_config:
#      basic_auth:
#        username: <optional>
#        password: <optional>
      tls_config:
        insecure_skip_verify: true

```

###


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://oiadocs.cloudfabrix.io/integrations-guide/integrations/prometheus-as-datasource.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
