Availability options with vROPs 8.

While deploying vROPs 8, you will find there are 2 different types of HA configuration and that’s the scope of this post:

  • High Availability
  • Continues Availability

High Availability:

  • This option is used for only 1 node failure i.e. Master Node failure and it has to be very clear this is not a disaster recovery solution, it only protects the analytics cluster against the loss of a node.
  • This requires 2 node cluster where 1st node will be behaving as Master node and Second node as replica node
  • As node type replica node is a data node only but as a complete copy of the Master node that’s why called as replica node.
  • If 1 node fails then manual intervention is required to repair the cluster and bring cluster out of the degraded mode.
  • When HA is enabled, it lowers vRealize Operations Manager capacity and processing by half, because HA creates a redundant copy of data throughout the cluster, and the replica backup of the master node. Consider your potential use of HA while planning the number and size of your vRealize Operations Manager cluster nodes.
  • Its important to avoid having both VM’s in 1 host, so the best option is to use anti-affinity rules that keep nodes on separate hosts in the vSphere cluster.
  • If in case, 1 node needs to be removed that has one or more vCenter adapters configured to collect data from a HA-enabled cluster, one or more vCenter adapters associated with that node stops collecting. You need to change the adapter configuration to pin them to another node before removing the node

Continues Availability:

To understand vROPs supported Continues Availability (CA), we need to be clear on following terms:

  • Fault Domain: In this case, its a logical group of vSphere cluster and master/replica node of the vROPs cluster.
  • Analytic cluster: Is a term used for vROPs AI based data collection and processing engine.

CA separates the vROPs cluster into 2 fault domain, stretching across vSphere clusters. These 2 fault domains provide us with analytics cluster which protects against 1 site complete failure.The most critical component of this topology is Witness Node which does not collect nor store data. With the help of Witness node, in a split brain situation 1 fault domain will be made offline to avoid data inconsistency issues.Using “Bring Online” button from the admin UI, the offline node can be brought back but before that ensure that the network connectivity between the nodes across the two fault domains is restored and stable.

  • To enable CA, you must have at least one data node deployed, in addition to the master node. If you have more than one data node, there must be an even number of data nodes including the master node. For example, the cluster must have 2, 4, 6, 8, 10, 12, 14 or 16 nodes based on the appropriate sizing requirements. The data stored in the master node in fault domain 1 is stored and replicated in the replica node in fault domain 2.
  • When CA is enabled, it lowers the vRealize Operations Manager capacity and processing by half, because CA creates a redundant copy of data throughout the cluster, and the replica backup of the master node.
  • Its important to avoid having these VM’s in 1 host, so the best option is to use anti-affinity rules that keep nodes on separate hosts in the vSphere cluster.
  • If you cannot split the data nodes into different vSphere clusters, do not enable CA. A cluster failure can cause the loss of more than half of the data nodes, which is not supported, and all of vSphere might become unavailable.
  • The administration interface displays the resource cache count, which is created for active objects only, but the inventory displays all objects. When you remove a node from a CA-enabled cluster allowing the vCenter adapters to collect data and rebalance each node, the inventory displays a different quantity of objects from that shown in the administration interface.

Conclusion:
– If your requirement dictates to provide HA for a small to medium environment, its fine to use HA option.
– If your requirement dictates to provide protection against 1 complete site failure then use CA option.

Thank you for reading and we don’t mind if you have a look on rest of the posts!

Source: VMware Documentation

Samesh Dhankhar
Please follow and like us:
Twitter
Visit Us
Follow Me
LinkedIn
Share