LogoLogo
Ops IntelligenceAsset IntelligenceObservabilityRobotic Data
  • Introduction
  • How it Works
  • Getting Started
  • Glossary
  • Implementer Guide
    • cfxDimensions Installation
      • Hardware and Software
      • cfxDimenions on VMware vSphere
        • Post cfxDimensions VM Installation
      • SSL Certificates Installation
      • cfxDimensions Setup & Install
        • Known Issues
      • cfxDimensions High Availability
        • GlusterFS Operations
        • Minio Operations
        • MariaDB Operations
      • cfxDimensions Start, Stop order
      • Macaw CLI
        • macaw CLI Installation
          • macaw CLI v2.1.17
        • macaw setup
        • macaw infra
        • macaw platform
        • macaw user
        • macaw application
        • macaw status
        • macaw services
        • macaw clambda
        • macaw techsupport
        • macaw backup
        • macaw restore
        • macaw reset
      • Release Notes
        • cfxDimensions v2.0.3
        • cfxDimensions v2.1.17
        • cfxDimensions v2.2.20
    • cfxDimensions Backup & Restore
    • cfxOIA Installation
    • cfxOIA Application Services
    • cfxOIA Release Notes
      • cfxOIA v5.1.5
      • cfxOIA v5.1.5.2
      • cfxOIA v5.1.5.3
      • cfxOIA v6.0.0
      • cfxOIA v6.1.0
  • KEY FEATURES GUIDE
    • Incident Management
      • Incidents Overview
      • Create Incident
      • Incident States
      • Accessing Incident
        • Stack
        • Alerts
        • Metrics & Logs
        • Insights
        • Collaboration
        • Diagnostics
        • Remediation
        • Attachments
        • Activities
      • Incident Actions
    • Alert Management
      • Alerts Overview
      • Alert Analytics
      • Alert States
      • Alert Sources
    • Advanced Alert Configuration
      • Alert Mappings
      • Alert Enrichment
      • Alert Correlation & Suppression
        • Creating and Updating Correlation Policies
        • Creating and Updating Suppression Policies
        • Correlation Recommendations
    • ML Driven Operations
    • Data Exploration
    • RDA (Robotic Data Automation)
      • Accessing UI
      • Sources Addition and Configuration
      • Check Connectivity
      • Proxy Settings
      • Explore
        • Bots
        • Pipelines
        • Schedules
        • Jobs
    • Analytics
  • UI & PORTAL FEATURES GUIDE
    • Filters Management
    • Customizing Table Views
    • Exporting Data
  • Administrator Guide
    • User Roles & RBAC
    • Collaboration
    • Projects
      • How to add Project
      • Configure Project
        • Stacks
        • Incidents
        • Alerts
        • Messages
          • Message Endpoints
            • Rest Data Consumer
            • Kafka Message Consumer
            • ServiceNow SaaS
            • Webhook with Basic Authentication
          • Message Mappings
        • Teams
        • Datasources
        • Resolution Codes
  • INTEGRATIONS GUIDE
    • Integrations Overview
    • Featured Integrations
      • AppDynamics
      • Dynatrace
      • Microsoft Teams
      • NetApp Cluster Mode
      • NetApp 7 Mode
      • Prometheus
      • ServiceNow
      • Slack
      • Splunk Enterprise
      • VMware vCenter
      • Zabbix
      • NodePing
      • Nagios XI
      • Check MK
      • VMware vRealize Operations
      • PRTG Network Monitor
      • Grafana
      • AWS Cloudwatch
      • ManageEngine OpManager
      • PagerDuty
Powered by GitBook
On this page
  • Prerequisites:
  • Prometheus Alert Rules Configuration:
  • Prometheus Alertmanager Configuration for Alert Notifications:
  1. INTEGRATIONS GUIDE
  2. Featured Integrations

Prometheus

Alert notifications from Prometheus Alertmanager

PreviousNetApp 7 ModeNextServiceNow

Last updated 3 years ago

Prerequisites:

This section explains on how to integrate and ingest alerts from Prometheus monitoring tool into CloudFabrix AIOPs platform.

Prometheus Alertmanager is alert management component which supports alert notifications via email, slack, webhook and others. CloudFabrix AIOPs platform uses webhook notification method over HTTP protocol to receive and ingest the alerts or events.

Click here for to create a Webhook URL for Prometheus alert notifications in CloudFabrix OIA application.

Prometheus Alert Rules Configuration:

Below is the sample configuration to define Alert threshold rules to trigger alerts for monitored assets with alert rules configuration file. (Note: Below alert trigger rules for reference only)

groups:
- name: ALERTENGINE
  rules:
  - alert: ALERT_MANAGER_FAILURES
    expr: rate(alertmanager_notifications_failed_total[5m]) > 0
    labels:
      severity: CRITICAL
      category: ALERTING
    annotations:
      title: Alertmanager is failing to send notications
      description: Alertmanager is seeing errors {{$labels.integration}}

- name: CATASTROPHIC
  rules:
  - alert: HOST_DOWN
    expr: avg_over_time(up{job=~"Hosts|Containers"}[2m]) == 0
    labels:
      severity: CRITICAL
      category: AVAILABILITY
    annotations:
      summary: "{{$labels.instance}}: Host is unreachable. Host could be down. The Collecors are not accessible. If the host is up, make sure collectors are running."
      description: "{{$labels.instance}}: Host is unreachable. Host could be down. The Collecors are not accessible. If the host is up, make sure collectors are running."

- name: HOST
  rules:
  - alert: HOST_HIGH_MEMORY_USAGE
    expr: (((avg_over_time(node_memory_MemTotal_bytes[5m]) - avg_over_time(node_memory_MemFree_bytes[5m]) - avg_over_time(node_memory_Cached_bytes[5m])) / (avg_over_time(node_memory_MemTotal_bytes[5m])) * 100)) > 80
    labels:
      severity: HIGH
      category: HOST_MEMORY
    annotations:
      summary: "{{$labels.instance}}: Memory Usage detected above 80"
      description: "{{$labels.instance}}: Memory usage usage is above 80% (Current Used Memory % is: {{ $value }})"

  - alert: HOST_HIGH_DISK_USAGE
    expr: ((avg_over_time(node_filesystem_size_bytes{fstype=~"(ext.|xfs)"}[5m]) - avg_over_time(node_filesystem_free_bytes{fstype=~"(ext.|xfs)"}[5m])) * 100 / avg_over_time(node_filesystem_size_bytes{fstype=~"(ext.|xfs)"}[5m])) > 70
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk {{$labels.device}} Usage detected above 70"
      description: "{{$labels.instance}}: Disk  {{$labels.device}} usage usage is above 70% (Current Disk Used % is: {{ $value }})"

  - alert: HOST_HIGH_CPU_USAGE
    expr: (100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)) > 70
    labels:
      severity: HIGH
      category: HOST_CPU
    annotations:
      summary: "{{$labels.instance}}: CPU Usage detected above 70"
      description: "{{$labels.instance}}: CPU usage usage is above 70% (Current CPU % is: {{ $value }})"

  - alert: HOST_HIGH_DISK_UTILIZATION
    expr: rate(node_disk_io_time_seconds_total[5m]) / 10 > 90
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) utilization is very high."
      description: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) utilization is very high. (Current Utilization is: {{ $value }})"


  - alert: HOST_HIGH_DISK_INODE
    expr: avg_over_time(node_filesystem_files_free{fstype=~"(ext.|xfs)"}[5m]) / avg_over_time(node_filesystem_files{fstype=~"(ext.|xfs)"}[5m]) * 100 <= 20
    labels:
      severity: HIGH
      category: HOST_DISK
    annotations:
      summary: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) High number of inode usage"
      description: "{{$labels.instance}}: Disk ( {{ $labels.device }} ) High number of inode usage. (Current value is: {{ $value }})"

Prometheus Alertmanager Configuration for Alert Notifications:

Below is the sample configuration for Prometheus alertmanager to send alert notifications to CloudFabrix AIOps platform over Webhook URL. (config.yml)

route:
  repeat_interval: 1m
  receiver: cfx-webhook

receivers:
- name: cfx-webhook
  webhook_configs:
  - url: 'https://<cfx-aiops-webhook-URL>'
    send_resolved: true
    http_config:
#      basic_auth:
#        username: <optional>
#        password: <optional>
      tls_config:
        insecure_skip_verify: true

Alert Sources