Self Healing Environment with vROps.

We are well aware of debate between VMware Administrator and OS/App owners, on topics like resource utilization and right sizing.

With this blog the objective is to help both parties to right size the VM after build phase, based on actual utilization (if required) by adding resources to the VM automatically.

If you are concerned like me if the resource request from OS may be spike, we can create environment specific “Wait cycle” which makes it more tailored for your environment.

This objective will be achieved with vRops and idea started with Sunny Dua and Simon Eady Webinar series. Link (Demo shows the use case from older version but the post is related to vROps 7.0).

With latest versions of vROps, Use-case become simpler and accurate.

Note: LAB Environment used for this testing consists of vSphere 6.5 is monitored by vROps 7.

Benefits:

  • Reducing the high utilization period for VM’s, based on the configured Wait Cycle period automatically.
  • No need to do guess work on how much resources to be added in a High Utilized VM as vROps recommends the amount to vCPU, Memory or both to add.
  • This can be created as premium service with help of “Function Group” which will help to select candidates for this service.

Prerequisites: 

  • Hot-Add configuration for vCPU and Memory needs to be enabled in VM settings.
  • Hot-Add configuration for vCPU and Memory feature needs to be supported by OS.
  • VROps service account on vCenter must have edit privileges on these VM’s.
  • VMware tool version on VM’s must be 10.3 or above.

Use case for Testing:

vCPU: Add more vCPU to VM if the vCPU usage is 90% or more for at least 10 mins.

Memory : Add more memory to VM if the Memory usage is 50 % or more for at least 10 mins.

Configuration:

  1. Create Function Group:

Start with creating a Function group for VM, which will help to assign a dedicated policy for this purpose.

Environment -> Custom Groups -> Function -> Click + to create a new group.

1

Above I have used condition that if the VM name has “-Hot”, it will be included in the group.

Click preview and check if the concerned members are added in the group on not!

  1. Auto Heal – vCPU:

 Start with creating “Symptom Definition” for CPU Usage (%).

2

The definition states 90% or above CPU usage is marked as Critical which can be modified as per requirement.

If we explore the advanced option, we can configure:

Wait Cycle: 1 Cycle is for 5 minutes

Cancel Cycle: 1 Cycle is for 5 minutes

So Below the total Wait Cycle for the Resource Request is 10 mins and Cancel Cycle is 5 Mins which defines when the Symptom will be triggered.

3

  • Create an “Alert Definition” for CPU Usage (%).

4

After adding the symptom, create an action to automate the recommendation.

  • From Add Recommendation, click on +

5

Save the Recommendation and add the action in the recommendation column.Save the alert definition as well.

  1. Auto Heal – Memory:
  • Start with creating “Symptom Definition” for Memory Usage (%).

6

The definition states 50% or above Memory usage is marked as Critical which can be modified as per requirement.

If we explore the advanced option, we can configure:

Wait Cycle: 1 Cycle is for 5 minutes

Cancel Cycle: 1 Cycle is for 5 minutes

So Below the total Wait Cycle for the Resource Request is 10 mins and Cancel Cycle is 5 Mins which defines when the Symptom will be triggered.

7

  • Create an “Alert Definition” for Memory Usage (%).

8

After adding the symptom, create an action to automate the recommendation.

  • From Add Recommendation, click on +

9

Save the Recommendation and add the action in the recommendation column.

Save the alert definition as well.

  1. Create Custom Policy :
  • Create a Custom Policy from Administration -> policies -> Policy Library -> Click +

10

Jump to “Alert/Symptom Definitions” to make the alerts created in Step 2 & 3 as local.

11

  • Assign the policy to the custom group created in Step 1.

12

Once the configuration is ready, we can test the configuration on a test VM which must be part of custom group.13

Using the stress tool like “Heavy load”, high utilization was achieved on VM for CPU and memory.

  1. Final Output :

After the “Wait Cycle” duration the action was performed on the VM successfully which can be confirmed from Administration -> History -> Recent tasks.

13

Thank you for reading and please share.

Regards

Samesh Dhankhar

Samesh Dhankhar
Please follow and like us: