LogoLogo
Ops IntelligenceAsset IntelligenceObservabilityRobotic Data
  • Introduction
  • How it Works
  • Getting Started
  • Glossary
  • Implementer Guide
    • cfxDimensions Installation
      • Hardware and Software
      • cfxDimenions on VMware vSphere
        • Post cfxDimensions VM Installation
      • SSL Certificates Installation
      • cfxDimensions Setup & Install
        • Known Issues
      • cfxDimensions High Availability
        • GlusterFS Operations
        • Minio Operations
        • MariaDB Operations
      • cfxDimensions Start, Stop order
      • Macaw CLI
        • macaw CLI Installation
          • macaw CLI v2.1.17
        • macaw setup
        • macaw infra
        • macaw platform
        • macaw user
        • macaw application
        • macaw status
        • macaw services
        • macaw clambda
        • macaw techsupport
        • macaw backup
        • macaw restore
        • macaw reset
      • Release Notes
        • cfxDimensions v2.0.3
        • cfxDimensions v2.1.17
        • cfxDimensions v2.2.20
    • cfxDimensions Backup & Restore
    • cfxOIA Installation
    • cfxOIA Application Services
    • cfxOIA Release Notes
      • cfxOIA v5.1.5
      • cfxOIA v5.1.5.2
      • cfxOIA v5.1.5.3
      • cfxOIA v6.0.0
      • cfxOIA v6.1.0
  • KEY FEATURES GUIDE
    • Incident Management
      • Incidents Overview
      • Create Incident
      • Incident States
      • Accessing Incident
        • Stack
        • Alerts
        • Metrics & Logs
        • Insights
        • Collaboration
        • Diagnostics
        • Remediation
        • Attachments
        • Activities
      • Incident Actions
    • Alert Management
      • Alerts Overview
      • Alert Analytics
      • Alert States
      • Alert Sources
    • Advanced Alert Configuration
      • Alert Mappings
      • Alert Enrichment
      • Alert Correlation & Suppression
        • Creating and Updating Correlation Policies
        • Creating and Updating Suppression Policies
        • Correlation Recommendations
    • ML Driven Operations
    • Data Exploration
    • RDA (Robotic Data Automation)
      • Accessing UI
      • Sources Addition and Configuration
      • Check Connectivity
      • Proxy Settings
      • Explore
        • Bots
        • Pipelines
        • Schedules
        • Jobs
    • Analytics
  • UI & PORTAL FEATURES GUIDE
    • Filters Management
    • Customizing Table Views
    • Exporting Data
  • Administrator Guide
    • User Roles & RBAC
    • Collaboration
    • Projects
      • How to add Project
      • Configure Project
        • Stacks
        • Incidents
        • Alerts
        • Messages
          • Message Endpoints
            • Rest Data Consumer
            • Kafka Message Consumer
            • ServiceNow SaaS
            • Webhook with Basic Authentication
          • Message Mappings
        • Teams
        • Datasources
        • Resolution Codes
  • INTEGRATIONS GUIDE
    • Integrations Overview
    • Featured Integrations
      • AppDynamics
      • Dynatrace
      • Microsoft Teams
      • NetApp Cluster Mode
      • NetApp 7 Mode
      • Prometheus
      • ServiceNow
      • Slack
      • Splunk Enterprise
      • VMware vCenter
      • Zabbix
      • NodePing
      • Nagios XI
      • Check MK
      • VMware vRealize Operations
      • PRTG Network Monitor
      • Grafana
      • AWS Cloudwatch
      • ManageEngine OpManager
      • PagerDuty
Powered by GitBook
On this page
  • Overview
  • How it Works/Key Points
  • Correlation Policy - How it Works and Key Controls
  • Minimum Severity of Alert Group:
  • Time Boxing
  • Precedence
  • Property Filters
  • Group By
  1. KEY FEATURES GUIDE
  2. Advanced Alert Configuration

Alert Correlation & Suppression

Overview

Alert correlation is the process of grouping together related alerts to reduce noise and increase actionability of alerts and events. Correlated alerts are grouped translated to CFX incidents, which are then routed to ITSM systems for handling by NOC/IT Analysts, who can then login to OIA (Operations Intelligence and Analytics) Incident Room module to perform swift triage, diagnosis and root cause analysis of an Incident.

How it Works/Key Points

  • Ingested alerts and events are normalized to OIA alert model, to allow addressing most alerts/tool implementations

  • Customers can add custom attributes to alert model using enrichment process

  • Ingested alerts are enriched with context about application, stack, department, ownership, support-group etc. using a process called alert enrichment.

  • Enriched alerts are then evaluated for any correlation or suppression to be performed. Suppression policies are used to suppress alerts that escape maintenance windows.

  • Alerts that remain are then evaluated for correlation that is determined by correlation policies, which are setup in 3-ways

  1. System defined policies: To address well-known behavior like alert burst and alert flapping situations.

  2. ML driven correlation recommendations: OIA uses unsupervised ML clustering to detect alert patterns and provides list of suggested correlations in the form of Symptom Clusters.

  3. Admin defined correlation policies: Administrators can define new correlation policies or customize existing policies to meet their needs. For instance, correlation policies allow admins to group alerts across a full-stack or an application instance. Admins can also group alerts across a common infrastructure (like network, storage etc.) or shared services (ex: SSO, DNS etc.).

Correlation Policy - How it Works and Key Controls

Correlation policies are in enabled state when created, but can be disabled. Correlation policies determine how alerts can be grouped together. Most of the correlation policies can be created in an assisted-manner by recommendations provided by OIA's correlation engine with symptom clusters.

A correlation policies can result in one or more instances of alert correlations, each represented by an Alert Group

Following controls are available to specify correlation behavior.

Minimum Severity of Alert Group:

Severity of alert group is always determined by the highest severity of alerts that it comprises of. However, if customers want to a certain minimum level of severity to alert groups formed by this correlation policy

Time Boxing

Time boxing is the concept of grouping related alerts that fall within a certain time window, like 15-mins, 30-mins or 1-hour. The time window is started when first matching alert is detected and closed after the time window expires. Any new matching alert after time window expiration will result in new alert group instance formation and leading to a new incident.

Precedence

Precedence values help determine which policy takes precedence when conflicts arise, which could arise when an alert matches multiple policies. For example, an alert belonging to symptom cluster "prod" and application "CMS" can match both policies that are setup to correlated alerts at application level (app-name == CMS) or at symptom cluster level (cluster-name == prod). By providing higher precedence to application-level policy, alerts can will be grouped at application level.

Precdence is numeric value, and higher values indicate higher precedence and take priority in case of match. Precedence values are optional, if not provided, system provides Precendence values automatically, based on chronological order i.e newly created correlation policies will get higher precedence.

A typical approach would be setup more wider or broad-scope correlation policies with higher precedence and more specific correlation policies to be with lower precedence.

Property Filters

Narrows down related alert selection criteria using a set of property filters that match property fields with specified values using conditions like (equals, contains, in list of values etc.)

Property filters allow fine grained control of correlation policies to meet organizational processes, administrative domains or functional groups.

Group By

Related alerts can be grouped by values in a certain attribute. This works best for attributes that are typically of type enumeration, list of values or represent a limited set of identities.

For example, assume Machine-Type attribute has following values Machine-Type = Application, Server, Network, Storage

then if the Group By selects Machine-Type as attribute, correlation engine will automatically group alerts which have

"Machine-Type == Application" into one group. "Machine-Type == Server" into one group, "Machine-Type == Storage" into another group, "Machine-Type == Network" into another group.

Group By can also use multiple attributes for advanced scenarios to yield more complex situations.

Continuing from same example above, let's add one more attribute and use Group By with two attributes

Machine-Type = Application, Server, Network, Storage 
Environmnt = Prod, UAT

With two group by attribute selections indicated above, following alert group correlations will be

"Machine-Type == Application and Environment == Prod" into one group. 
"Machine-Type == Application and Environment == UAT" into one group. 
"Machine-Type == Server and Environment == Prod" into one group. 
"Machine-Type == Server and Environment == UAT" into one group. 
"Machine-Type == Storage and Environment == Prod" into one group. 
"Machine-Type == Storage and Environment == UAT" into one group. 
"Machine-Type == Network and Environment == Prod" into one group. 
"Machine-Type == Network and Environment == UAT" into one group.
PreviousAlert EnrichmentNextCreating and Updating Correlation Policies

Last updated 3 years ago