7. Demo: DRS 7 Affinity Rules, Groups, and VM Overrides
In this video, I’ll show you how to configure affinity rules, DRS groups, and VM overrides on a DRS cluster. So here you can see I’m in my Vsphere client and I have a DRS cluster here. And on the DRS cluster I’m going to click on configure. And so the first thing I want to show you is VM overrides. So for VM overrides overrides, I can click Add here and I can say, okay, maybe for this app server I want to create an override. So I’ll choose that virtual machine. And now I’ll say this virtual machine, I want to change the DRS automation level. My cluster is set to manual. But maybe I want to change the automation level on this particular VM to fully automated so that it can be migrated around my cluster without any intervention from me.
Or maybe my cluster is automated and I want to change certain VMs to manual. This is a more likely use case. So I always think about Vcenter. Think about your Vcenter server for a moment. I’m accessing this vsphere. Client through Vcenter. This Vsphere client is not going to be available if Vcenter goes down. And what if I have 50 hosts? Well, maybe Vcenter is at the blue screen of death and I just need to reboot it. How do I find it? How do I figure out which host to get into to reboot V center? It’s going to be a real pain if I don’t know which host it’s running on. So I may want to go to my Vcenter virtual machine, set the DRS automation level on that particular VM, the manual.
That way I’ll always know what host Vcenter is running on. And then at least if that host fails, high Availability can still reboot Vcenter on some other ESXi host. But DRS is never just going to automatically move my Vcenter virtual machine around. And that’s a good use case. For a VM override is a scenario in which I don’t want Vcenter to automatically migrate certain VMs. Another good option here are VM host rules. So for example, I’m going to rename a couple of these VMs just for the purpose of illustrating this concept. I’m going to call one virtual machine DC One. That’s one of my domain controllers. And let’s pretend that this other virtual machine is another domain controller. So at the moment, these domain controllers, if I look at their summary, are running on the same ESXi host.
That’s not good, because if that ESXi host fails, it’s going to take down both of my domain controllers. So that’s not what I want here. So under VM host rules, I can click Add and I can create a new anti affinity rule. These are virtual machines. I don’t want to keep them together. I want to keep them apart. I want to separate these VMs. I want to make sure that Domain Controller One and Domain Controller Two are always running on separate ESXi hosts. So that’s what we call an antiafinity rule, a rule that DRS will respect to keep virtual machines on different ESXi hosts. So that’s an anti affinity rule. Let’s think about another use case. So I’m going to change the name of this virtual machine to app server, and maybe my app server and my web server communicate with each other a whole lot. Maybe they’re constantly working together.
Well, in that case, I may want to go to VM host rules and create an affinity rule. I’ll just call the app and web and these two VMs, I want them to run on the same host so that way they can just communicate through the same virtual switch and their traffic doesn’t need to traverse a physical network. That’s a good case for an affinity rule. Keep these two VMs on the same host. Now let’s take a look at some more advanced options. With these rules. I’m going to rename my VMs one more time. I’m going to call this virtual machine Dev One, and then I’ve got my domain controller and my web server, right? So I’ve got three virtual machines. Dev One is owned by my development team.
So Dev One is my development team’s only VM, and I like to try to keep all of my development VMs on this host. That’s an older host. It’s a host that I’ve dedicated to development. So let’s see how we can make that happen. I’m going to create some VM host groups, and so my first group is going to be my dev hosts. These are all the ESXi hosts that my dev team should be using, and I’m going to give them this host. And so that’s my first group, and I’m going to create another host group called Prod Hosts. These are for my production virtual machines. I’m going to add an ESXi host to that group, and I’ll hit okay.
So now I’ve got two groups, right? I’ve got my dev hosts and my prod hosts. And so I would like all of my development VMs to run on my development hosts and all my Prod VMs to run on my prod hosts. So the next step I’ll take is I’m going to create a group called dev VMs, and it’s going to contain all the VMs that belong to my development team. I’ll hit okay, here I’ve got a virtual machine group, and I’ll create another virtual machine group. I’ll call it Prod VMs, and that is going to contain all of my production virtual machines. So now I’ve kind of logically grouped my virtual machines. I’ve created four groups. One group with all my dev hosts, one group with all my production hosts, one group with all my devms, and one group with all my production VMs.
And so now that I’ve created those groups, I can go to VM host rules, and I can create rules around those groups. So I’m going to create a first rule called dev and it’s going to be a virtual machine to hosts rule, basically stating my development virtual machines must run on my development hosts. So that’s my first rule. A development VMs must run on my development hosts, and I can create another rule for my production VMs. I’ll call it prod that my production VMs must run on my production hosts. And I’ll go ahead and hit OK there.
So now I’ve created rules that are going to ensure that the right VMs run on the right hosts. And these are called mandatory affinity rules. These are required. They cannot be violated. So what my rule is essentially saying is that the dev VM has to run on this host. Even if this host fails, this rule will be enforced. If that host fails and I have High Availability configured, high Availability won’t do anything. It’s going to say, you know what, dev one can only run on this host. I’m not going to reboot it on any other host. So required affinity rules, ones where we specify certain VMs must run on certain hosts, those rules cannot be violated by High Availability. I can choose to configure a preferential affinity rule that says certain VMs should run on certain hosts.
And with that type of rule, DRS will live by this rule. DRS will keep the devvms on the devhosts, the production VMs on the production hosts. But if a host fails, and I have a preferential affinity rule specifying that certain VMs should run on certain hosts, high Availability can violate this rule to get those virtual machines back up and running. So that’s the difference between a required affinity rule that’s saying certain VMs must run on certain hosts or a preferential affinity rule that can be violated by High Availability if there is a failure. And I can also kind of go the other way here. I can say certain VMs must not run on certain hosts, or certain VMs should not run on certain hosts. I can create those sort of preferential affinity rules the other way around as well.
So those are some of the rules that we can create here inside our DRS cluster to control which VMs run together, which VMs should stay apart, which VMs can run where. If you’re creating a DRS cluster for the first time, what you want to do is really think carefully about which VMs are redundant with each other. Do I have multiple database servers that are part of a cluster? If so, I need an antiinfinity rule so that they don’t run on the same host. Do I have multiple domain controllers? If so, I need an anti affinity rule to make sure that they don’t run on the same host. Do I have certain virtual machines that I need to know which host they run on? If so, I can create a preferential affinity rule binding them to a certain host unless there’s a failure. Or I could even use a VM override to change certain virtual machines automation level to manual.
So these are all critical things that I need to think about prior to basically turning over the keys to DRS. Like, for example, let’s say that I had a self driving car. Well, there are certain rules that I want to set up in advance before I get in a car and let it drive me around. Don’t smash into other cars. Don’t drive off a cliff. That’s the same thing with DRS. I’m basically putting my cluster on autopilot. I may want to think about everything that could go wrong when I let DRS take over. Is it going to migrate? VMs that should be separate and put them together. Is it going to move? VMs that really need to stay on one host. Maybe there’s VM that their licensing requirements require them to stay on the same host. I need to think about those things prior to just cranking on DRS and letting it go.
8. Demo: Avoiding Downtime with DRS and Maintenance Mode in vSphere 7
And in this particular demo, I’m using the free hands on labs that are available at Hol vmware. com. So here we have a cluster that includes five ESXi hosts, and I’m going to click on the configure option for this cluster. And you can see that DRS is enabled in partially automated mode. So if I take one of the hosts in that that cluster and place it into Maintenance mode, the host cannot enter Maintenance mode as long as there are running virtual machines on it. So here’s ESX one A, and you can see that on this host, there are a couple of virtual machines that are currently running. Now, if I need to perform some sort of maintenance on this host, like, for example, let’s say I need to install some new physical memory in this host.
The first step for me to do that is to right click the host and put it into Maintenance mode. So it’s giving me an option here, and it’s asking me if we want to move powered off and suspended virtual machines to other hosts. And that seems kind of like a weird question. There’s six VMs on this host that are currently not running. They’re powered off. So why would I bother to migrate VMs that aren’t even running to a different host? Well, let’s say that I’m planning on decommissioning host ESXi One, and it’s never coming back. The first step of that is to put the hosted maintenance mode and get all the VMs off of it. So if I don’t bring this host back, any of these VMs that are registered to it will basically, at that point, be unregistered, and I’ll have to manually go in and register them individually.
Not a huge deal, but I can save myself some trouble by migrating those VMs to other hosts in the cluster. So I’m going to go ahead and do that. I’m going to hit okay here. And now it’s warning me that there’s one or more powered on virtual machines running on this ESXi host. So the host cannot enter maintenance mode until these VMs are migrated to another host. And let’s take a look at our cluster here. And under monitor, I’m going to go to DRS and take a look at the recommendations. And because this cluster is in partially automated mode, it didn’t automatically move these virtual machines off of that host. It just produced a recommendation telling me this host is entering maintenance mode. These VMs need to be migrated to other hosts, and it picked other hosts to move these virtual machines to. So I’ll go ahead and apply these recommendations and let DRS do its thing, let it start migrating these VMs to other ESXi hosts. And that’s what it’s doing. It’s using Vmotion to move these running virtual machines to other hosts. And when it’s done, this host enters Maintenance mode. So now let’s take this host back out of maintenance mode, and let’s talk about some of the implications that this carries if I’m dealing with updates. So I’m going to click on the update tab for this cluster, and I am going to attach a couple of patch baselines to this cluster, and I am going to check this cluster to see if they are up to date with all of the patches in those baselines.
So I’m scanning the entire cluster, and I can see here that the cluster is not compliant with my non critical host patches. So I have these five ESXi hosts, all of which have non compliant software. They need some sort of update, and when these updates are performed, these hosts may need to enter maintenance mode. And I’ve got running virtual machines on some of these hosts. So if I click on this non critical host patch baseline and choose remediate, it is going to push these patches out to these five ESXi hosts. And in doing so, it may need to place these hosts in maintenance mode. And so now that I’ve got DRS configured, DRS should migrate virtual machines off of these hosts as they enter maintenance mode. Now, that being said, I am going to make one change to my DRS cluster here.
I’m going to change DRS automation to fully automated. That way, update Manager doesn’t have to stop and wait for me to apply recommendations. It can just go about the process of staging these patches, remediating these hosts one by one. And as the hosts need to reboot, DRS will place each host in maintenance mode, evacuating all of the running VMs off of it. So now I can patch an entire cluster of ESXi hosts with zero downtime because DRS and update Manager are going to work together. So now you have a good understanding of how update Manager works in compliance with DRS to avoid downtime when it comes to performing maintenance on your ESXi host.
9. vSphere Cluster Quickstart
In this video, I’ll demonstrate a new feature that was released in Vsphere six seven, update one, and that is still supported in Vsphere seven called Cluster Quickstart. And Cluster Quickstart was built to allow you to easily create a cluster that included things like vSAN and that has consistently configured VM kernel, ports and the necessary networks automatically created. If you want to have a standardized, simplified configuration of your cluster, that makes it really easy to get a cluster up and running, the Cluster Quickstart is a great way to do that. Now, before we jump in here, I just want to go to Hosts and Clusters and I want to mention that it’s really important that the hosts that you’re going to use to create this cluster have a consistent hardware profile, a consistent configuration.
So here you can see I have four hosts. We’re going to add the first three hosts to my cluster. And so I can see on each of these three hosts I have three physical network adapters. And in all three of these hosts, VMIC One is currently unused. So I’ve got similar physical network adapters. I’ve got shared data stores that are available across all three of these hosts. And each of these hosts also has local storage devices. Each host has 220 gig flash drives and these are going to be useful for vSAN. So I can see this host, my second host, Host Two, also has 220 gig flash devices and my third host also has 220 gig flash devices. I’m going to use those for the VCN part of this cluster.
Now I’ve also got a fourth host and I’m not going to use my fourth host as part of my cluster. I’ve got my Vcenter server appliance running on that fourth host. So I’ve gone through a few different iterations of this process and the last time I tried it, I included the host that was running the V Center virtual appliance and that host ended up going to Purple Screen of Death. So I’m just going to take Vcenter completely out of this equation and I would recommend that you do the same because when you create this Cluster Quickstart, it’s got to place all three of these ESXi hosts in maintenance mode simultaneously. So I’m just kind of getting V Center out of the way so that when I create my cluster, I’m not going to run into that problem again. So yeah, let’s go ahead and get this process started. I’m going to create a new cluster right now. I’m not going to enable any services and I’m just going to go ahead and call it New Cluster. So let’s hit.
OK. Here. And now a new cluster has been created and notice it immediately brings up this Quick start screen. So let’s start with the cluster Basics. Which services do I want to enable? I’m going to enable DRS, I’m going to enable High Availability and I’m going to enable vSAN on this cluster, let’s say okay here. And that’s really it for step one. We’re going to now move on and we’re going to add some hosts to this cluster. So if I have hosts that are currently not registered to V Centre, I could specify their IP addresses here and usernames and passwords for those hosts and they could get registered to V Center as part of this process. I’m not going to do that though. I’ve already registered all of my hosts with Vcenter. So I’m just going to grab these three hosts here and I’m going to add those to my cluster and I’ll click next and ready to complete.
I’ll go ahead and click on Finish. And so now this part is going to take a few moments and you can monitor the progress down here under Recent Tasks, where it’s basically going to go through a series of steps including placing these three hosts in maintenance mode. And the services that I specified, like DRS and High Availability, are going to be enabled and configured across these hosts. So right now what it’s doing is it’s running through a check. It’s validating that these hosts are going to be properly configured, that they are time synchronized, that they all have compatible software versions, that they all have configurations that are consistent with one another, that they have hardware that is certified for use with vSAN and with Vsphere.
And if there’s any problems with the configuration, they’ll pop up here and I can see, hey, my VCN hardware compatibility list is not up to date. Yeah, I’m not really super worried about that here in my lab environment. So I’ve got three hosts that ended up not configured. So the hosts are not currently configured. If I resolve these issues, like for example, if I get my VCN hardware compatibility list up to date, then this will no longer be read and I can resolve those issues and click on Revalidate to try that again. So I have an issue to resolve here. If I go ahead and navigate away from this cluster, I can always come back to the cluster again later on and I can just resume the quick start where I left off. So I’m going to go ahead and fix my hardware compatibility list problem here and then I’ll resume. Okay, so now I’ve resumed my hardware compatibility list issue for vSAN. So I’m going back to my cluster. I’ll go to configure one more time. I’ll go back to Quickstart and under Quickstart I’m going to revalidate these hosts. Now I can see the vSAN HCL is currently up to date. Now if I scroll down here now, it’s opening up the configure option to me. So that option wasn’t available before because I had a red alarm here. But if I go down now, now I have the option to actually configure this cluster. So at the moment, the three hosts in my cluster have not been configured.
I’m going to go ahead and click on the configure button here. And the first thing that this thing is going to do is create a Vsphere Distributed Switch and this is all in accordance with a VMware validated design. So it’s going to create a Vsphere distributed switch. It’s going to create port groups for the V Motion network and vSAN network as well, right? So we’re going to have to have VM kernel ports for V Motion and VM kernel ports for vSAN. It’s saying, hey, here’s some default port groups that will be automatically created and those will just be built for me on my Vs for Distributed Switch. Now remember how I showed you that all of my hosts had the same VM nix and VM nick? One was not in use on any of my hosts. So that’s good.
I kept my configuration consistent across all of my hosts and now I can pick a physical adapter that we can use on all of those hosts. So I’m going to go ahead and click next here and that physical adapter on each of these hosts will be dedicated to that Vsphere Distributed Switch. Here we can see my V Motion traffic is going to be going over my Distributed Switch on that port group that just got created. I can specify a certain VLAN here if I choose to, but I’m not going to bother doing that. And then here we can see the IP addresses that are being assigned to the VM kernel ports. I’m going to leave this for DHCP, but I could go through and configure static IP addresses for those V Motion VM kernel ports. And again, same with VCN, what port group is going to be used, which VLAN is going to be used? And I can again allow DHCP to automatically obtain IP addresses for me or I can go in and statically configure them.
I’m just going to let DHCP do it for me here. And then we’ve got all of these Ha and DRS options that you’d see in a normal Ha and DRS cluster. Like am I monitoring my hosts for failures? Am I enabling admission control? Am I doing DRS in manual or partially or fully automated mode? And then there’s VCN options here as well. This is just a regular single site cluster and I’m not going to enable any of these features like encryption or deduplication. I’m also not going to enable lockdown mode or EVC. I’m keeping this cluster configuration very simple and straightforward because I just want to demonstrate the cluster quick start. So now as part of the vSAN configuration, I need to claim some disks. So for the cached tier, which disks do I want to actually claim on each host? And so you can see here there’s some disks that are available on each host.
I have two Flash devices on each host. I’m going to claim one Flash device for the cache tier, one Flash device for the capacity tier, on each of these ESXi hosts so that I can have a functional VCN configuration. So basically each host is going to have one disk group made up of a cache device and a capacity device. And this is an all flash configuration. So I’ll go ahead and hit next here and what’s going to happen is once this cluster is successfully created, it’s going to have a vSAN data store made up of all of those virtual disks. I’m not going to bother with the proxy configuration here, I’m just going to hit next. So this is as far as we’re going to go with this demonstration and I just wanted to do this to give you an idea of what to expect with the cluster Quickstart.
And I’ll give you a couple of words of warning. There have been a considerable number of issues with this cluster Quickstart, especially relating to an error where it says that it failed to extract requested data. And I’m actually experiencing that here in my home lab environment. And most of the documentation that I’m reading says that’s related to some kind of trust issue that you can fix with a script. I’ve been working on that but have been unable to resolve it in my home lab. The other thing that I’ve been noticing is that VMware has documented that if you’re upgrading from versions of Vsphere prior to six five, this may be a problem. And I’ve been running the same home lab since version six.