Vmware 2V0-21.20 – Distributed Resource Scheduler (DRS) Part 1

May 28, 2023

1. DRS Enhancements in vSphere 7

Now prior to looking at some of these enhancements, I just want to take a moment to look back at DRS prior to Vsphere seven. And prior to Vsphere seven, we had this cluster wide deviation model. So essentially what that means is it was our goal to keep the hosts in a cluster evenly utilized. So here you can see we have four hosts and these hosts have been placed within the cluster. And again, it’s the goal here to keep these hosts evenly utilized. So if there are certain virtual machines that are responsible for a higher load, we want to move those virtual machines to a different host to try to keep these hosts evenly utilized. So here you can see I’ve got percentages of utilization on each of these hosts. And let’s add some virtual machines as well. And just kind of think of these orange VMs as kind of the high resource utilizers green.

They’re not using quite so much and the blue VMs are somewhere in the middle. So the orange VMs are the ones that do the highest utilization. And so you can see on host one, we’ve got two high usage VMs. And on host four, we’ve got four kind of medium usage VMs. And so therefore, those hosts are the most heavily utilized for things like CPU and memory. So DRS is going to address this issue by performing V motions and moving virtual machines around and trying to keep the deviation between hosts to a minimum, basically trying to keep the hosts equally utilized. And so in this model, DRS is going to run every five minutes to see if the hosts are evenly utilized and it’ll carry out V Motion operations in the event that they are not.

And again, there’s a lot more details on this like you can choose the migration threshold and how sensitive it is, but just kind of going with the defaults.This is basically how DRS works prior to Vsphere seven. Now in Vsphere seven, the focus changes. It’s no longer a cluster and ESXi host focus. DRS is focused on the workloads of individual virtual machines. And so DRS is going to calculate a virtual machine DRS score. And this score will be used to determine if this VM is working well or not. And the score is based on a number of factors. CPU percent ready values, CPU cache values. So how heavily is the CPU utilized on this virtual machine? Memory swap activity also the headroom of the particular host that it’s running on.

So if this virtual machine all of a sudden starts using more resources, does the host that it’s currently running on have sufficient resources to handle that expanding workload? And what is the v motion cost? So there’s always a cost associated with V Motion. There are CPU resources and there’s network traffic that are generated as a result of a V Motion occurring. What’s the cost of that? We have to bear that into the calculation to determine whether or not it’s worth it to actually migrate virtual machines. So each of these VMs have a score here, and the higher score is better. So the two VMs in green here, those are scoring pretty well, 90, 82. It’s a score out of 100. And the higher the score, the better the value is. These other VMs have a lower score, so those VMs are not performing as well as they could on some other ESXi host. And that’s really what it’s all about. It’s about the execution efficiency of an individual virtual machine.

That’s what the VM DRS score represents. It’s not a health score for the virtual machine. It’s about how efficiently that virtual machine is executing and whether or not it could potentially execute more effectively on a different host. And if it can, then DRS will again use V motion to migrate those virtual machines to different hosts. So it’s really now examined from a VM centric perspective. Rather than just looking at the utilization of the hosts within the cluster, it’s looking at the performance and execution efficiency of individual VMs. And if we can get a performance improvement for a VM by moving it to a different host, then that move will occur. Okay, so let’s take a quick look at some of the documentation that’s available here.

So here I am@storagereview. com, and they’ve got a nice little screenshot here showing the cluster DRS score within a DRS cluster in Vsphere seven. So basically, the cluster DRS score is an aggregation of the VM scores within that cluster. I’ve got many virtual machines running within this cluster. You can see a total of nine. The overall cluster DRS score is an aggregate of those values. So the goal here, if you’re looking at the cluster view and that’s what we’re looking at here, is to get a good overall insight on what’s happening from a DRS perspective. How is it performing overall, and are we getting a high score at an aggregate level? Another resource is the Vsphere seven blog. And if we scroll down a little bit here, we can see that same cluster DRS score, but we can also see the View All VM screen.

And if we scroll down just a little bit further, we can see a nice UI walkthrough. So here you can see they’ve gone to the cluster, they’ve clicked on Summary, and there under Vsphere DRS, we can see that cluster DRS score. If you click on the View Allvms link, you can see the DRS score for each of these virtual machines broken down by a percentage. And then you can kind of customize some of the columns here to look at things like entitled CPU and Memory. But really the key metric that we want to look at here in this particular image and let’s roll up a little bit here is the DRS score, right? So DB Two has a 99% DRS score that’s working just about as well as a virtual machine can possibly work on this cluster, the execution efficiency is great.

Now, if we’re seeing things like high CPU ready values, that means that virtual machines are unable to get CPU resources from the physical hypervisor when they need them. Or if we’re seeing memory swapping, that means that physical memory on the ESXi host is constrained. And in those situations, we’ll see a significantly lower DRS score. So as we wrap up this lesson, I just want to point out the overall goal of DRS and Visa for seven is essentially the same. Really, the only thing that’s changing significantly is the way that DRS handles the underlying monitoring mechanism. It’s looking at the efficiency on a virtual machine by virtual machine basis and determining whether or not a V motion will actually improve performance for that individual virtual machine.

2. Demo – Configure DRS for vSphere 7

In this video, I’ll demonstrate how to create a DRS cluster in the Vsphere client. So I’m starting out at the home screen of the Vsphere client. I’m going to go ahead and go to Hosts and Clusters. And in my inventory, I currently have two ESXi hosts. So I’m going to rightclick my training data center and I’m going to create a new cluster. And I’m just going to call my cluster cluster Rick Demo. And at the moment, I have three features that I could potentially enable. Number one, I have Vsphere High Availability, which is going to give me failover. So I’m grouping together multiple ESXi hosts. And if one of those hosts fails, the virtual machines that are running on it will reboot on some other host in the cluster. So at the moment, I am not going to enable High Availability.

And then I have Virtual San, which allows me to use the local storage of these hosts to create a shared data store. I’m also not going to enable vSAN at this point, but I am going to enable DRS. And so I’ll go ahead and enable that and I’ll click OK here. And what it’s going to do when you create a new cluster is it’s going to try to dump you right into this cluster quick Start? And I’ll have a video on that coming up shortly. So we’re going to skip the Quick Start here and we’re just going to go right up to Vsphere DRS and start to manually edit our settings there. So I’m just going to go ahead and click on Edit here. And we’re going to modify some of the configuration settings of this DRS cluster. And the first setting that I need to choose is the automation level of the DRS cluster.

So let’s start with manual, because manual is a great way to initially establish a DRS cluster, especially if you’re not overly familiar with DRS. So with manual, the automation level specifies that virtual machines in this DRS cluster will never be automatically moved around. That’s what DRS does. It moves virtual machines from host to host using Vmotion. And so if I configure this manual automation level, what that means is that DRS is never going to automatically migrate VMs. For me, what it will do is it will generate recommendations. It’ll watch the resource utilization across my cluster. And if it sees an opportunity to improve performance, it’ll give me a recommendation. So that’s manual in partially automated mode. What happens is if I have a virtual machine that I try to power on and it’s inside of that cluster, DRS will automatically choose the ideal host to run that virtual machine on.

This is called initial placement. So with partially automated DRS, virtual machines will never be automatically migrated. It’s not going to move running virtual machines around. But if I power on a VM, it’ll pick the ideal host based on the memory and CPU usage and some other factors it’ll pick the ideal host for that VM to power on, and then finally fully automated mode. This is where we’re really turning over the control to DRS. We’re saying DRS can automatically migrate virtual machines from host to host using Vmotion. And so we’re going to let DRS automatically do all of this stuff. It’s not going to generate recommendations for me when it sees an opportunity to improve performance, it’ll just move virtual machines around at that point. And so fully automated is really kind of the ideal way to run DRS.

Now I have to make sure that my V Motion configurations are in good shape. I have to make sure that my VMs can easily V motion from hostto host without any Vmotion compatibility issues. So in order for fully automated DRS to work properly, I need to make sure that V motion is working properly throughout my cluster. And then I can choose the migration threshold anywhere from conservative to aggressive. So in most scenarios, you want to leave this at the default right in the middle here of conservative. And aggressive should be your setting unless you have a reason to change it.

With conservative, DRS will only apply recommendations to satisfy cluster constraints. Like for example, if I try to put a host in maintenance mode, all of the running virtual machines need to get moved off of that host. That’s a migration that DRS will automatically perform. If I have virtual machines that have an anti affinity rule, virtual machines that are not allowed to run on the same host, I want to keep them apart. Those recommendations will be applied in conservative. And if I go up to level two here, DRS will then start to make some moves for performance improvement, but only when workloads are extremely imbalanced. So if I have a real resource problem on one host, it’ll start to migrate VMs off of that host. Kind of the middle of the road here, right in the middle. At level three, DRS is going to look for hosts and look for moderate imbalances of workloads.

So if a fairly significant performance improvement can be made, then DRS will carry out a V motion. And this is normally the ideal. I don’t want a whole lot of emotions occurring unless they’re actually going to help with the performance. Because every V motion that you carry out has an impact. It takes up network bandwidth and it utilizes CPU resources to make that V motion happen. So if I adjust this slider up to four or five, DRS is going to become much more aggressive. It’s going to look for migration options that produce a relatively marginal improvement in performance. And so this is going to mean more of emotions. It’s going to move virtual machines around more frequently. And so unless I have a really strong need to do that, I’ll typically just leave this slider at three and go with the default configuration for migration threshold.

So those are the different automation levels and the migration threshold adjustments that we can make with DRS. And then finally, the last two settings that I Want to look at here on the automation screen do I want to enable Predictive DRS. So What Predictive DRS will do is it will allow Vcenter To retrieve information from Vrealize operations to potentially migrate VMs before resource contention occurs. So Vrealize operations is sitting there analyzing all the performance of my ESXi hosts looking at all these metrics and if it sees A situation developing or a pattern emerging where potentially A resource problem is going to happen in the future, predictive DRS can be invoked to migrate virtual Machines before that resource problem happens.

So that’s predictive DRS, I’m going to leave that disabled here. And then last but not least, virtual machine automation. So for virtual machine Automation, I’m basically deciding, do I want the ability to go to individual VMs and change their automation level? So right now I’m setting up fully automated DRS on the entire cluster. But there may be certain virtual Machines that would work Better under manual Automation. Like maybe Vcenter, for example. I probably don’t want Vcenter being migrated around. I Want to know What Host It’s running on so I can find it if I need to. So I’ll leave that box checked to give myself those options to override that automation level on an individual virtual machine basis.

3. Demo – DRS VM Distribution and CPU Over-Commitment

In our last video, we looked at some of the automation levels and migration threshold settings when we created DRS cluster. In this video, we’re going to take a look at some of the additional options that we can configure. And the first option is VM distribution. So with VM distribution, what I’m trying to accomplish here is to basically equally distribute the virtual machines across the hosts in my cluster. And what we’re thinking about in this scenario is I may have certain VMs that are really low resource consumers and certain VMs that are really high resource consumers. So maybe I’ve got a bunch of web servers that are low resource consumers, but then I’ve got things like Exchange or SharePoint servers that are high resource consumers. What I don’t want to have is a scenario where I have a lot of low resource consumers running on a single ESXi host.

Like maybe I have 20 Web servers running on one host, five VMs running on the other hosts in my cluster. Well, if that one host with the 20 VMs fails, then I have 20 virtual machines that go down simultaneously. And so to impact the blast radius of any individual ESXi host failure, what I may want to do is enforce a more even distribution of virtual machines across the hosts in the cluster so that the failure of a single host doesn’t take down an inordinate number of virtual machines. The next setting is the maximum CPU over commitment, and this setting controls the CPU over commitment within the cluster. So what I’m essentially doing is setting a limit of how many virtual CPUs per physical CPU on my hosts.

So, for example, if I were to enable this feature and say the ratio had to be one to one, that means for every virtual CPU I create on a virtual machine, I have to have one corresponding physical CPU within my cluster. So if I were to make the ratio something like four to one, that means for every four virtual CPUs I create, I have to have at least one physical CPU in my cluster. And so I’m probably not going to go with something like a one to one ratio here because frankly, that really defeats the purpose of a lot of the benefits of virtualization. The goal here is to say, hey, I’ve got many virtual machines sharing a set of physical hardware, and I get these efficiencies of scale, and I get to share hardware across many VMs. Well, if I’m doing a one to one ratio, I can’t do that. I’ve got to have one physical CPU for every single virtual CPU that I create.

So I’m not really consolidating the way I would be ifI made this ratio something a little bit more aggressive, like five to one or ten to one or 20 to one. So now that we understand how to enable this feature, there are some important things to understand about it. Let’s say that we make this over commitment ratio 50 to one. And so now, as we power on virtual machines, those virtual machines will only be allowed to power on if they do not violate this CPU over commitment ratio. So that’s just something to be aware of when you enable this. The other thing that I just want to mention is that CPU over commitment is not enforced during times of a high availability failure. So let’s say that an ESXi host fails and 30 or 40 virtual machines go down.

Those virtual machines are going to be able to boot up on other ESXi hosts even if they violate this CPU over commitment ratio. So that’s good news that if host failure occurs, those ratios are momentarily ignored to allow these virtual machines to boot up on the remaining ESXi hosts. Now, one final note. What if I want to put a host into Maintenance mode? So let’s say I have four hosts in my cluster, and maybe I’m near the maximum of this over commitment ratio on all four of those hosts. Well, if I try and put one of those hosts in the maintenance mode, DRS is going to attempt to migrate those VMs to the remaining hosts.

And if this ratio is going to be violated as a result of moving those VMs, it won’t be able to place that host into Maintenance mode. So CPU over commitment is a nice little feature that allows me to basically set a maximum of the CPU over commitment that I’m willing to do. But it does come with some constraints that we want to be aware of before we enable it on our cluster. Now, so far, everything that we’ve talked about was available in Vsphere six seven. None of these features are new in Vsphere seven. Now, we had a lesson on this topic in the Resource Management section of this course. We went through exactly what Scalable Shares are.

But just as a quick reminder, if I enable this feature, what’s going to happen is when I create resource pools within this cluster, I’m not going to have to worry about equally distributing virtual machines across those resource pools. Scalable Shares are going to essentially normalize the share values on a per VM basis. And if you need a refresher on this, go back to the Resource Management section and take a look at the Scalable Shares lesson.

Uncategorized

Related posts:

Leave a Reply Cancel reply