cfxDimensions High Availability
cfxDimensions platform high availability architecture
CloudFabrix's cfxDimensions platform is built using cloud native architecture leveraging microservices using containers for all platform, infrastructure and application services. It supports deploying it in distributed mode to provide high availability and also allows to scale up based on enterprise environment's workload requirements.
Below picture provides high-level architecture of cfxDimensions platform when deployed in distributed mode for scale and with high availability feature.
cfxDimensions platform comes with the below services when deployed in HA & distributed mode.
- HAProxy: Loadbalancer service which front-ends the cfxDimensions platform for UI and any incoming traffic access.
- Apache Tomcat: For cfxDimensions platform's UI access
- Kafka / Zookeeper: Message queue which is used by cfxDimensions platform & application services
- Minio Object Storage: Object storage which is used by cfxDimensions platform & application services
- MariaDB: Database for cfxDimensions platform and application services
- Gluster: Shared storage filesystem which is used by cfxDimensions platform & application services
- Elasticsearch*: Used for Metrics and logs data. (Note: It is an optional service and it is not included by default during cfxDimensions platform deployment. It can be deployed as an independent service where needed)
Platform Services: cfxDimensions platform provides the core essential foundation services. These platform services provide critical services like identity management, encryption/decryption services for critical data, provisioner service for installing,updating & un-installing microservices, service registry etc. The below listed platform services are mandatory to make cfxDimensions platform functional.
- Service Registry: All cfxDimensions platform supported application microservices register with centralized Service Registry regarding service listing, service lookups, and interactions among application microservices or with external clients.
- Identity: Identity service provides user identity management, managing tenancy & local users. It supports integration with external identity management solution likes of LDAP/AD/SSO. This service is used by application services for identity management operations.
- Locker: It provides a secure way to store and access credentials using multi-level encryptions and advanced security principles similar to that of popular cloud providers.
- User-preferences: This service is used to store user specific preference settings in the UI. (Ex: Report settings, column selections, UI report layouts etc..)
- Notification manager: It is responsible for sending event notifications to application services about all service lifecycle events.
- Provisioner: This is service is responsible for provisioning, de-provisioning and upgrades of cfxDimensions application services. It supports manual or automated placement of application services among available (one or more) service nodes of the platform.
- Console UI: It is a supporting service which is used to provide necessary supporting UI related war files for tomcat infrastructure service. It is used only during initial load time. (Note: This service will be deprecated in future)
Consol UI service is deployed as a single service instance in HA deployment as it is used only during initial loadup time after the initial install of Platform services. It is no longer used post deployment of platform services and run time.
When cfxDimensions platform is deployed in distributed for scale and HA mode in production environments, below list provides minimum number of server instance requirement which can tolerate one server node failure and to provide continuity of AIOps application's access.
Note: As of current release, cfxDimensions infrastructure services are supported only in 3 server node deployment, if there is a requirement to deploy in more than 3 nodes, please contact CloudFabrix's technical support.
HAProxy: HAProxy is a software based load balancer for TCP and HTTP based applications. It is used within the cfxDimensions platform to loadbalance and high availability for infrastructure, platform and application services. Below listed are primary functions of HAProxy within the cfxDimensions platform.
- Front-end UI portal access, alerts/events & API access over HTTP(s) with SSL
- MariaDB access to platform and application services when MariaDB is deployed in cluster mode
- Minio object storage access to platform and application services
- Service registry access among application services
HAProxy service is containarized and configured in a specific way to be compatible with cfxDimensions platform and deployed in Active/Standby mode. It is deployed among 2 of the 3 node infrastructure service nodes. HAProxy's front end and back-end cluster IPs are virtual IPs which are configured and managed through keepalived service.
Keepalived: Keepalived is a service which is included part of cfxDimensions platform as a standard linux service. Keepalived uses the IP Virtual Server (IPVS) kernel module to provide transport layer (Layer 4) load balancing, redirecting requests for network-based services to individual members of a server cluster. IPVS monitors the status of each server and uses the Virtual Router Redundancy Protocol (VRRP) to implement high availability.
Keepalived service is configured as a Linux service on both Active/Standby nodes of HAProxy service.
It's primary function is to manage the HAProxy's front-end & back-end virtual IPs providing high-availability at network layer. When HAProxy active node (or service) goes down, keepalived detects the failure instantly and automatically transfers the HAProxy cluster virtual IP to HAProxy standby node. The application traffic (internal or external) will be re-routed and processed through HAProxy standby node seemlessly.
Kafka & Zookeeper: Kafka is used as a message queue to publish (write) and subscribe to (read) streams of events among application services within the cfxDimensions platform. Kafka natively supports deploying in distributed mode which allows to scale and provide high availability. Zookeeper service is also used along with Kafka service which also deployed in distributed mode to provide high availability and scale.
Kafka & Zookeeper services are containarized and configured in a specific way to be compatible with cfxDimensions platform. When deployed in 3 node configuration along with default replication settings (replication factore is set to 2), it provides 1 node failure tolerance.
Kafka & Zookeeper disk mount points on each cluster node:
- Data mount point: (Kafka & Zookeeper)
- Service logs path: (Kafka & Zookeeper)
Minio Object Storage: MinIO is a high performance Object Storage which is API compatible with Amazon S3 cloud storage service which can be deployed in distributed mode for scale and high availability. It is primarily used to store and query the configuration, ML experiment data, pipelines, alert bundles, inventory and analytical data files etc. Minio service is containarized and configured in a specific way to be compatible with cfxDimensions platform. When deployed in 3 node cluster, it is configured with 12 disk mount points (4 disk mount points per node) in total with Minio rrs storage class/policy is set to EC:4 (i.e. 8 data disks and 4 parity disks)
Minio object storage disk mount points on each cluster node:
- Data mount point: (Minio object storage)
- Minio container logs
docker ps | grep -i minio
docker logs <minio-container-id>
Gluster shared filesystem: GlusterFS is a scalable and distributed network filesystem which is used to share the filesystem among cfxDimensions platform application and infrastructure services to share and store service logs, certificates and configuration. Gluster service is containarized and configured in a specific way to be compatible with cfxDimensions platform.
Gluster is deployed in 3 node configuration similar to other infrastructure services. It is deployed as 2 data replication nodes and 1 arbiter node. Each data replication node will contain a data brick volume which is used for data replication between Gluster cluster nodes. For data replication, only two data brick (volume) nodes are used while third node acts as a arbiter node which is aimed at preventing split-brains and providing the data consistency guarantees as a normal replica 3 volume without consuming disk space.
The configured Gluster volume name is 'macaw' and it is mounted on all of the cfxDimensions platform VMs as /opt/macaw (VMs: Platform, Infrastructure (DB/Data), Application services & cLambda)
GlusterFS shared filesystem mount point on each cluster node:
- Data mount point: (Gluster data brick)
- Gluster container logs
docker ps | grep -i gluster
docker logs <minio-container-id>
MariaDB Database: MariaDB is a relational database application service which is used to store user configuration, platform & application configuration, alerts and incident data of cfxDimenions platform and respective application services. MariaDB supports high availability natively and it can be deployed as Master/Slave or Master/Master configuration using Galera clustering feature. Within the cfxDimensions platform MariaDB is deployed in Master/Master (Galera cluster) configuration. MariaDB service is containarized and configured in a specific way to be compatible with cfxDimensions platform and it's application services.
MariaDB database mount point on each cluster node:
- Data mount point:
- DB service logs path: