1. The vSphere CPU Scheduler
So, first off, we have a world. A world is simply a thread of execution inside an ESXi host. So a good example of a world would be a virtual CPU for a virtual machine. Each virtual CPU that I configure a virtual machine with is going to be a threat of execution on an ESXi host, and it’s going to be represented in ESX Top as a world. So a virtual machine actually includes multiple worlds, especially if it has multiple processors.
The number of worlds per VM is really going to depend on the number of processors. But there’s also a world for displaying the mouse, keyboard, and screen. There’s another world for the virtual machine monitor. So these are basically just processes that run for each virtual machine. And there are also worlds that are unrelated to virtual machines. So much for functions like Vmotion, for example. So these worlds are basically just processes that my ESXi host is running. Then we have a symmetric multiprocessor. So a symmetric multiprocessor is a simple one. When you see SMP, which is what you want to think of, it is what allows the ESXi host to support virtual machines with multiple CPUs. That’s really all it is: a symmetric multiprocessor allowing a virtual machine to leverage multiple physical processors at once.
Here’s something I really don’t recommend configuring unless you have a licensing requirement to do so: CPU Affinity. CPU affinity basically bonds a virtual machine to one specific physical processor. So that means that that virtual machine can’t be viewed. That virtual machine can’t failover with high availability. CPU affinity is basically forcing it to use one particular processor. So try not to enable that feature unless I have a licensing requirement to do so. And then finally, ESX Top is something that we’ll be taking a look at in future lessons. So I’m not going to get too deep on it now. But ESX Top is a command-line performance monitoring tool that I can use to analyses the performance of an ESXi host. And many of the measurements and performance indicators that I’m about to talk about here can be easily viewed in ESX Top. So let’s start with CPU sharing. Here we see a diagram of an ESXi host. And the ESXi host has two physical processors, two CPU sockets, and each of them is a quad-core CPU.
So I have a total of eight CPU cores. Now I go ahead and create myself a virtual machine, and I configure it with two virtual CPUs. Each of those virtual CPUs is going to be assigned to one of these processor cores. So now I can see my VM is using two logical processors with two CPU cores on the CPU socket. And if I boot up another VM on that same host, and this VM has four virtual CPUs, Now that VM could potentially be sharing the same processor cores as the first virtual machine. So ESXi allows multiple virtual machines to share the same processor. It does so through a mechanism called time slots. What it basically does is allow these VMs to take turns using that processor. So if I put too many VMs on the same host, they’re not going to work quite as well.
I don’t want too many VMs sharing the same set of processors and pushing them too hard. So you should always be striving to hit that ratio just right. You want to make the most of the processors you have. You want them to be 60% to 70% utilized, but you don’t want them to be 90% utilised and spike up to 100% because then your virtual machine performance is going to suffer. So that’s our first key concept in this video. Virtual machines can share the same physical processor cores. So now that we’ve learned that a virtual machine can share physical processors with another virtual machine, here we see two VMs, each of them configured with two virtual CPUs, and they’re sharing the same processor cores. Although it doesn’t make much sense in this scenario, Why are these two VMs sharing these two processor cores when there are other cores available with nothing using them? So the ESXi host will actually do CPU load balancing below the surface.
We don’t have to configure this. We don’t have to manage this. It just happens. So in this case, the ESXi host may recognise that it could be more efficient and take that second VM and migrate it to these other processor cores. Again, you don’t even see this happening. It’s all done beneath the surface. And the ESXi host will automatically migrate VMs from one core to another to improve performance. If you’ve configured CPU affinity, this is not possible. CPU affinity pegs your VM to one particular processor. Now let’s take a moment to talk about hyperthreading. So here you see two processors. On the left, we have a non-hyperthreaded processor. On the right, we have a hyperthreaded processor. So let’s think about what the difference is between these two processors. Now, what I want to start out by saying is that neither processor is more powerful than the other. So both processors basically have the same horsepower. They’re both equally fast.
They’re both equal. The non-hyperthreaded processor, however, is less efficient. So with a non-hyperthreaded processor, VMs are essentially taking turns. The VM on the left sends an instruction. It gets completed then, and only then can the VM on the right send an instruction and have it get completed. So there’s actually some idle time, right? And if we watch this animation, you can see where the idle time is. So here, VM One sends a command to this processor. It’s completed. Now the processor is doing nothing, waiting for that second command to come in. So there’s that time in between CPU instructions when the processor is idle. Let’s look at it one more time. The first command gets completed. Now the processor is idle. The second command arrives. Now the processor is idle again. So what hyperthreading does is basically reduce that idle time. It allows multiple virtual machines to send commands simultaneously, like we see in this animation. And now, the moment that the processor finishes with the first command, the first instruction, it can immediately move on to the second instruction because it’s already there, ready to go, and the idle time gets reduced.
So hyper-threading doesn’t necessarily make your processor faster. What it does is reduce the idle time of your processor and make it more efficient. And that’s really what makes a hyperthreaded processor faster. Now, what do you need to know about this in your environment? Well, you need to have hyper threading enabled in the BIOS. That’s really the big thing. If it’s enabled in the BIOS, then ESXi can take advantage of it and leverage it. So just make sure that if you have a hyperthreaded processor, that hyperthreading is enabled in the BIOS. Okay, so now we’re going to walk through some examples showing how the ESXi host shares CPUs. And we’re going to start with the old way. So we have to understand the way it used to work if we’re going to understand the way it works now. And so we’re going to do a little history lesson here. This is called strict coscheduling. And so, with strict coscheduling in this diagram, we have an ESXi host. And the ESXi host has four processor cores. So that’s what we’re seeing here. It’s an ESXi host with four processor cores. And so here at the bottom of the screen, we see that we’ve got 1234. Those are my four processor cores. I have a single-socket quad-core CPU. So any VMs that run on this host are going to share these four processor cores. Here I have two virtual machines running on this host.
VM One, which has four virtual CPUs, and VM Two, which has two virtual CPUs When a virtual machine starts executing code, it requires processors. What’s going to happen is that virtual machine one is going to grab those four processor cores and say, “I’m using them.” And so for that particular time slot, the purple virtual machine (VM) on the right is out of luck. If it wants to use processors, it has to wait until the next time slot. And my orange virtual machine needs four processors in order to work. So it has to wait until the next time slot until the purple VM is done. And so what you end up with is essentially two VMs taking turns. Now it’s my turn to use the processor. Now it’s my turn; you have to wait. Now it’s my turn. You have to wait. They’re taking turns. As a result, we have some unused time slots. We see these slots that are marked as unused. Those are CPU resources that could potentially be used.
But because the orange VM needs four processors in order to function, it can’t take advantage of those two spare time slots. So what we can end up with in this scenario is something called a “high CPU ready value.” So what do I mean by “CPU ready”? CPU-ready means this virtual machine is ready to execute something. So let’s say that the purple VM has something that it wants to do. It’s ready to execute some sort of task. It needs CPU resources to do it. But in this first time slot, all of those CPUs are currently utilized. This VM has entered the ready state. It says, “I’m ready.” I need physical processors. And the VM is going to remain in that state of being ready until those physical processors are actually available. Then it can actually carry out what it needs to. So the longer AVM remains in its ready state, the worse off we are. We want low CPU-ready values because that means a virtual machine is ready to do something but hasn’t done it yet.
Okay, so now that we’ve talked about the way things used to work with strict co-scheduling, let’s talk about the way things are now with relaxed co-scheduling. Again, I have a single-socket quad-core CPU with four processor cores, and again, I have the same two virtual machines. My Orange virtual machine has four CPUs, while my Purple virtual machine has two. and a similar situation occurs. The Orange VM is using all of these cores in the first time slot. The purple VM has to wait. The Purple VM may be ready, but it can’t access any physical processors just yet. It has to wait until the next time slot. So far, it’s really no different. So far, it’s really no different. Here’s where the big change comes in. Now my Orange VM can say, “Okay, the Purple VM is using two time slots.” I can take those other two time slots and use them. Even though I don’t have access to all four virtual CPUs, I can continue to make progress by executing CPU tasks and instructions.
And this pattern will continue. And so what we really have here is the big benefit of relaxed co-scheduling: the fact that we’re not wasting the time slots. It’s much more efficient. But this introduces a different risk. This introduces a different problem called SKU. So we can now run into a problem called processor skew, or CPU SKU. So when I configure a VM like that orange VM, the orange VM has four CPUs. The operating system of that VM doesn’t know what’s going on in the background. The operating system doesn’t know that those virtual CPUs are actually running on a physical CPU that’s being shared across multiple systems. And so what my operating system expects is that when some sort of CPU instruction is kicked off, all four of those processor cores will perform similarly. It expects those processes to be executed simultaneously. But if the ESXi host is not allowing it to use the time slots for vCPU Three and vCPU Four, What’s going to happen is those processors are going to start to get behind in trying to execute those instructions.
And if the difference becomes big enough, if a set of CPUs is making progress and another set of CPUs is not, this increases what we call SKU processors. and the ESXi host has an acceptable threshold for this. So when that threshold is reached, the Es Xi host will start holding back CPU One and CPU Two. They’ll start limiting them. This is called COStop. If you look in Essex Top, you’ll see that this is written as co-stop. So if vCPU One and vCPU Two are basically being throttled back, that’s expressed as “co stop.” So there are still problems. There are still issues if you try to load too much work onto these physical CPUs. Even though we’ve kind of evolved since strict constraint scheduling, there are still performance considerations that we need to think about here. Now, let’s go back to our strict scheduling slide here for just a moment because I want to talk about one other concept I’ve mentioned. You don’t want to put too many VMs on the same host. You don’t want to push too much work onto one individual host because you’re going to run into CPU problems. You’re going to run into co stop.
You’re going to run into performance problems. There’s another way that you can potentially relieve some of this contention. So let’s say in this scenario that my OrangeVM is configured with four virtual CPUs. But let’s say that my Orange VM is currently using about 25% of the CPU resources that have been allocated to it, right? And maybe VM 2, my purple VM, is using something closer to 70%. Well, I can solve all my problems right now with these two VMs because the VMs are improperly configured. They’re not the right size. VM 1, my Orange VM, does not need four virtual CPUs. It doesn’t need it. It’s only using 25%. If I were to bring this VM down to two virtual CPUs, if I were to right-size it now, it wouldn’t be using 25% of its resources anymore. It would consume 50%. There’s nothing wrong with that. There’s nothing wrong with 50% CPU utilization. So when I bring it back to two virtual CPUs, I’ll see the CPU rise for this VM, and the CPU usage will rise to about 50%. That’s going to have no performance impact on that VM. VM is going to work just as fast as it always did. As a matter of fact, that’s actually going to improve the performance of my Orange VM.
And this is one of those weird concepts that people have a hard time wrapping their heads around. I can actually remove CPUs from the Orange VM, and as I bring it down from four to two virtual CPUs, the performance will actually improve. And here’s why: Now, VM 1, when it needs to execute, is going to try and grab two virtual CPUs. And when my Purple VM tries to execute anything, it’s going to grab two CPUs. These VMs don’t need to take turns anymore. VM One and VM Two each need two processors, and there are two processors open for them. They don’t have to take turns. They can simultaneously use all of their processors at the same time, all the time. So by right-sizing, I’ve actually reduced my CPU contention. And by reducing the CPU account of the Orange VM, I’ve actually made it perform better. And I’ve made the second VM perform better as well. I’ve reduced the likelihood of CPU ready, as well as the likelihood of CPU skew, because it no longer has a vCPU Three and a vCPU Four. It didn’t need them. vCPU One and vCPU Two are having no problem getting the time slots they want. They’re executing simultaneously. So I’m not going to see that company stop either.
So it’s really beneficial to right-size your virtual machines if you’re having CPU performance issues. That’s usually the first step in ensuring that my VMs are properly sized and configured with the appropriate amount of CPU. Okay, so now that we understand the basic concept of how the Vsphere CPU scheduler works, let’s talk about CPU contention so we can use shares, and we’ll see more about that later in this course. But I can use shares to determine which virtual machine takes priority. If one virtual machine has twice as many shares as another, that virtual machine is going to get twice the CPU entitlement of that other VM. And that’s really how this is all broken down. The hosts carve up those CPU resources in those timeslots, and then the share structure determines the entitlements. So how much CPU time is each virtual machine actually entitled to? That’s determined by shares. Cpu-ready time occurs when a VM is ready to execute against the CPU but must queue that command until it’s ready. My VM is ready, but the processor is not. So the VM has to wait. So there are some great documents that you can utilise to learn more about CPU performance in Vsphere. The first document that I’d like to show you is the Vsphere Resource Management Document, and this is the version for Vsphere Seven.
But if you’ve worked with past versions of Vsphere, you’ll notice that this document is very similar to those former versions as well. And what I want to cover here are the fundamentals of CPU virtualization and CPU resource administration. So this is a relatively high-level look at how an ESXi host divides up resources amongst virtual machines and how CPU virtualization works in a nutshell. Then it goes into much greater detail about how to manage CPU resources. So if you’re looking to configure things like processor configuration, hyperthreading, or CPU affinity, those things are all referenced in detail here. And if we look a little bit deeper into the table of contents here, You can see there’s all sorts of stuff in memory, storage, and resource pools. It gets into DRS clusters as well. And there is a section that breaks down Puma. As a result, numa is the physical architecture of processors as well as their regionality with memory. And if this is a concept that you don’t really understand, this document is a pretty nice breakdown as to what Pneuma architecture looks like and how virtual machine processes are placed on the physical CPU cores of your host. Now, here’s the other document that I want to reference, and this one’s an oldie but a goodie. In VSphere five dot one, the CPU scheduler received significant improvements.
And some of what you see in this document is going to look very familiar. It’s going to walk you through the difference between strict and relaxed coscheduling (we just went through CPU load balancing and migrating VMs from one processor to another). It will cover topics such as numa migration and how numa affects all of this. So it’s a really great document. It walks you through the different CPU states, like “Wait” and “Ready.” And all of those good things to know walk you through strict and relaxed code scheduling and the differences between the two. So if you’re looking for more details on any of those sorts of topics, I know it’s an old document, but it’s a good one. The CPU scheduler in VSphere 5 is a fantastic read for understanding what’s going on beneath the surface with your CPUs.
2. Memory Virtualization
In this video, I’ll explain how virtual machines are able to utilise the memory resources of the hypervisor. And so let’s take a look at this diagram, which will explain a little bit about how this works. And the blue blocks that you see here represent my virtual machines. So each virtual machine is going to have its own operating system. In this case, I’ll just assume it’s Windows. So my virtual machines have the Windows operating system installed on them. And I’m going to have applications running within both of these virtual machines. And when I create these VMs, I’m going to allocate a certain amount of memory to them. So each VM will be given four gigabytes, ten gigabytes, or whatever amount of memory it requires. And that means that that virtual machine is able to use a maximum of that amount of memory. But we’re not really guaranteeing a certain amount of physical resources for any of these VMs.
So, sort of like with our processors, this works much in the same way. Just because I give a virtual machine a two-gig memory allocation doesn’t mean that I’m actually guaranteeing that VM two gigs of physical memory. At all times, the virtual machines are going to share the physical memory of the host. I may also have an oversubscription. So, for example, in this little scenario here, maybe I’ve granted this VM four gigs of memory. And this VM has been granted eight gigabytes of memory. And that’s simply what I’ve allocated, right? The host itself might have a total of, let’s say, ten gigabytes of memory. This is possible because the host can efficiently share its memory across multiple virtual machines. and I can oversubscribe. This is how my hypervisor efficiently shares resources across multiple VMs. Let’s say in this scenario that I launch some applications on my first virtual machine. So within Windows, I’ve launched this application. And Windows has its own memory table. It’s got a memory table where it tracks free and used memory pages. So, in terms of Windows, if I give this VM four gigs of memory, Windows believes I have four gigs of physical memory. Windows can’t tell the difference. It’s inside a virtual machine. It has no idea that it’s been virtualized. So as applications launch within Windows, it will allocate pages of memory within that 4 GB, and it will track that memory and which pages are free and which pages are used.
And on the hypervisor itself, the hypervisor is then mapping those memory pages that are being allocated within the guest OS to actual, real physical pages of memory. And that’s kind of the beauty of the way that the hypervisor works, because until an application is actually launched, the memory is not used. So this VM may be configured with four gigabytes of memory, but it’s not actually using four gigabytes. I know I’m making this kind of messy here, but the VM is allocated four gigs of memory. It doesn’t actually take four gigabytes of memory off the physical host until it needs it. Memory is thinly provisioned. That means it’s actually only granted to the VM when the VM actually requires it. So that’s one of the ways that my hypervisor efficiently uses memory. And then maybe I’ve got another virtual machine that’s also launching applications, and Windows is marking those pages in its own memory table. Allocating virtual memory to those applications, and those memory pages are mapped back to physical pages of memory on the physical host by the hypervisor. And when a virtual machine actually closes an application, the host is going to have to reclaim those memory pages. So let’s assume that my virtual machine here closes up one.When it closes at one, the operating system is immediately aware that APPone has been closed, and so the operating system marks those pages as free. Now we have to remember these Windows instances. These guest operating systems don’t know they’re running within a virtual machine. So the guest operating system is never going to inform the host, “Hey, I don’t need these memory pages anymore.” That doesn’t happen.
So we have to have some kind of mechanism running on the physical host and on the hypervisor to occasionally look inside the guest operating system to see what memory is being used and what memory is not being used anymore. And if those memory pages are no longer in use, take them back and allow other virtual machines to use them. Now there’s one final thing I want to mention here while we’re talking about memory. If I want to, I can use something called a memory reservation. So, for example, if my VM on the left has been granted four gigs of memory, I’ve allocated four gigs of memory. If I want to, I can create a memory reservation. And what a memory reservation will do is carve out four gigabytes of physical memory and guarantee that memory to that virtual machine at all times. No other VM can leverage those four gigabytes of memory. It’s been specifically reserved for that particular virtual machine. And in most cases, reservations are not very desirable. And the reason I say that is because it’s great for the virtual machine that you’ve reserved that memory for.
But for all of the other virtual machines running on that host, that’s four gigabytes of memory that they cannot use. I always sort of equate a memory reservation to a kid licking a toy. Let’s say there’s just one really fancy toy in a room full of kids, and one kid grabs it, licks it, and plays with it. And nobody wants to touch that toy anymore because it’s been licked, right? Even if he’s not using that toy, nobody wants to touch it. That’s kind of like a reservation. My VM may only be using two gigabytes of the memory that it has reserved, and there might be another two gigabytes of memory sitting idle. But it doesn’t matter. That memory is reserved, and no other virtual machines can use it. So we try to steer clear of reservations whenever possible. So to quickly review: when a virtual machine launches an application, the guest operating system has its own table tracking what we call guest physical memory. Right? What the operating system of the VM perceives is physical memory. And it goes ahead and maps virtual memory within the operating system to those applications, which is then mapped back to the actual host physical memory by the hypervisor.
3. Demo – VM Performance Charts in vSphere 7
for a client. and I’m going to go to VMs and templates. And under VMs and Templates, I’ve got a few virtual machines running. I’m going to focus on the demo VM. And when we click on a VM, we can go over here to the Monitor tab. And under the Monitor tab, we can see any issues with this virtual machine or any alarms that are currently triggered. But I’m more interested in looking at the performance charts at this point. And the first area that we’re going to take a look at is the performance overview. And in my opinion, this is an extremely valuable source of information because it gives you a really high-level overview of what the performance of this virtual machine looks like. So this gives me a fast way to look at the four food groups.
And what I refer to as the “four food groups” are the CPU, memory, storage, and network. Those are the four food groups that my VM consumes, the major resources that could potentially be limited and could potentially impact the performance of this VM. So I can very quickly see here, “hey, what is the CPU usage in megahertz of this virtual machine?” and “what do the CPU ready values look like?” And if I’m experiencing higher than normal ready values, that’s definitely a problem for me. CPU-ready means that the virtual machine is ready to do something. It’s ready to execute something on a physical processor, but it doesn’t actually have access to one yet, so it’s waiting on the ESXi host for those CPU resources. So yeah, if I’m seeing the CPUReady values that are higher than they should be, that’s a great indicator to me that the CPU resources of the host itself are constrained. And under memory, I can see some very useful metrics. I can see consumed and granted memory, but I can also see ballooning and other metrics.
So you can see here the percentage of host physical memory that’s been consumed. I can see the memory that’s being actively read and written by the guest. So not only what memory is actually being used by this film, but what memory is it truly actively utilizing? And I can see here that physical memory is shared across multiple VMs. So that’s an efficiency that we get with transparent page sharing. And there are other metrics as well, such as physical memory consumed, host memory consumed, and so on and so forth. So there are a lot of nice little metrics here for memory. We can also export these little charts here as well. And so these basic little charts here can actually give us a whole lot of information as to what the overall performance of this VM is. This last little metric here is ballooning activity. So is memory being actively reclaimed from this virtual machine so that it can be reallocated to other virtual machines? And I can see that, yes, there has been some ballooning activity occurring on this VM. And then, if we scroll down a little bit, we can see some other really useful metrics like disc latency.
How long does it take to complete storage operations? If this virtual machine—let’s assume it’s a Windows VM—is reading and writing to and from its virtual disks, What does the latency look like for those operations? What is the disc usage for this virtual machine? And then at the very end here, I can see the network bandwidth usage. So the overview chart gives me a great high-level view of what the overall performance characteristics of this VM are, what kind of resources it is consuming, and what kind of performance it is getting out of those resources. And so I kind of look at that as, “Hey, now I’ve gotten a nice high-level overview of what’s going on.” Maybe I’ve seen some red flags here related to the CPU or some red flags related to memory. And now I’ve decided, okay, I need to dig deeper. I need to learn more about what’s going on in this VM. Then I can go to my advanced charts and dig through more detailed and more granular information. So, under advanced, let’s say I believe my issue is with the CPU.
Well, I can click on the chart options here, and I can focus specifically on CPU metrics. So maybe I want to look at CPU readiness. Readiness is the percentage of time that the virtual machine was ready but could not be scheduled to run on the physical CPU. And so now I can either focus in on that one particular metric or I could modify the chart to show a few different metrics, like, for example, CPU usage. And so what I’m going to look at in this particular chart is a real-time chart. Real-time charts are measured every 20 seconds. And so that’s the sampling interval. It’s going to show me values in real time, and it’s going to show me over the last hour what these values look like. I can see my different CPUs here. Here I can see the CPU usage here.I can see the CPU readiness for the entire VM. And then I can break that down to individual processors as well. And I really can’t overstress the importance of that one particular metric, readiness. This is a great way for me to understand, “Hey, is the host constraining this virtual machine?” Is the host not able to give this VMCPU resources when it really needs those resources?And when you see CPU readiness, you may think that sounds like a good thing: the CPU is ready. It’s not a good thing.
CPU readiness indicates the percentage of time that the virtual machine is ready to use the CPU, but it can’t get access to a physical processor. So this is a great way for me to determine if the physical processor is able to satisfy the CPU requirements of that particular VM. And so, yeah, that’s a really important metric when we’re taking a look at the CPU performance of a VM. And so then we can also modify other characteristics of our performance chart here. So right now we’re looking at a real-time performance chart, but we can also change the time span here. So let’s say that I want to look at the last day. Well, now the counters are going to change a little bit here, but yeah, I can look at, for example, CPU-ready values for the last day. And now the chart is going to look significantly different. It’s going to be broken down over an entire day. And I can modify that chart to look at the last month, the last week, or the last year. But this is a relatively new virtual machine. Let’s go over to my VCenter server appliance, and I can look at a chart for the last week there and see what’s been going on from a CPU usage perspective for my VCenter server appliance. And yeah, so that’s a great way for me to kind of dig deep; maybe I think I’m having memory issues. Well, I can go, and I can look at memory counters. I can see what kind of ballooning activity has been occurring on this particular VM. Ballooning is not necessarily a bad thing. So let’s take a look at ballooned memory for the last week.
And you can see here that it’s been zero all week. Well, that’s probably a good thing. When the ESXi host starts to run low on memory, it looks to reclaim idle memory from existing VMs. That’s what ballooning is. So ballooning isn’t necessarily a bad thing, but if you see it going on constantly, kind of all the time, then you’ve got a real problem on your hands. Let’s go back to memory. Let’s look at memory for the last week for this VM, and let’s look at ballooned memory, consumed memory, and swap in and swap out rates. Well, there’s not a lot of data for this VM. Let’s just look at the last hour in the real-time chart in there. I can see the change in rate. I can see how much memory is being used. Swapping is really the worst scenario from a memory perspective. That means that the Es Xi host is basically completely out of memory, and memory operations are being carried out on disc instead, or maybe on SSD. But yeah, we’re not going to break down all of these metrics and exactly what they mean. But I just want to show you that if you need to dig into one of these performance areas, this is a great way that you can do it by utilising the performance charts in the Vsphere client.
As you can see, I simply typed “ESX Top reference running dash system” into Google. This is an easy way to find this great document that I frequently reference when I’m trying to figure out what I’m looking at in Essex Top. and it’s the visa. Six ESX top. A quick overview will help you with troubleshooting. So what’s so great about this document? Well, number one, it’s going to break down everything for you. Very simply. It’s going to start with an overview of all of the ESX Top commands, which is a great way to get to know what happens when I hit M and what happens when I hit a capital V.
Like I said, I’ll demonstrate some of this stuff for you in the next video. But this is a great reference to have handy. And when you launch ESX Top, it will bring you to the CPU screen. So here you can see a nice summary of all of the fields that are being shown for you on the CPU screen and a quick summary of what each of those fields mean, like, for example, “CPU ready,” which is the percentage of time a VM was waiting to be scheduled. The VM is ready, but the physical processor is not. If you see values above 5%, take care. That means you’ve got a bit of a problem on your hands. And so all of these little fields and ESXTop are broken down for you very clearly here. In ESX Top, the disk, numeric, and memory fields are shown. And if I scroll up a little bit, you’ll see network things like dropped receives and dropped transmits that I need to be concerned with.
And there’s even an area in Essex Top for vSAN. Now you can see the recovery operations that are occurring and all sorts of great information like that. So this is a great reference if you’re just getting started with ESX Top or even if you’re like, “I’ve been using it for a while.” It’s a very handy, quick one-page reference to help you navigate your way around ESX Top. Here you can see the Vsphere monitoring and performance document for Vsphere Seven. And within this document, we’re going to get all kinds of great information about how to monitor the performance of our Vsphere environment, how to work with charts (advanced charts, custom charts), how to monitor within the guest operating system, how to monitor the health of our hosts, how to set up alarms, events, and automated actions, and so on and so forth. And then there’s a whole section on utilising ESX Top as well, which we’re going to look at in our next video. We’re going to do a demo of ESX Top. But this document breaks down all of the details of how to effectively use ESX Top to monitor an ESXi host. And it’s also going to show you how to pull the system log files as well. So you’ll see it displayed on your screen. Here is a link. I’ll show you how to get to the viewer. Documents for monitoring and performance.
5. Demo – ESXTOP
So in this video, I’m again using the hands-on lab environment at Hol vmware.com.And you can see here that this particular lab is equipped with putty. And so we’ve got Putty here available to us, and we can launch Putty and connect to our ESA Xi hosts. So let me go ahead and close this and start this over here’s.Putty down here at the bottom of the screen. I’m just going to click it, and what I’m trying to do is get command-line access to an individual ESXi host. That’s what the ESX top is all about. I’m going to get into one particular ESXi host, and I’m going to use ESX Top to monitor that host. So I’m going to load up the session here. I’m actually going to make a quick change before I get in there. I’m just going to increase the size of my font to make it a little bit easier for everybody to see, and then I’ll go ahead and connect to my ESXi host. Now, what you can also do is locally connect to an ESXi host and use the shell to launch ESX Top. But I’m going to do it remotely. So I am using Putty to establish an SSH session. And then on the ESXi host, I’ll simply type “ESX top” and hit enter. And what it will do is start me at the CPU screen.
So here I can see my physical CPU information. Up at the top, I’ve got seven VMs configured with seven virtual CPUs, and I can see my average load on the physical processors of this ESXi host. Here’s my physical CPU usage percentage, and here’s my physical CPU utilisation percentage for this particular ESXi host. And my CPU utilisation is relatively low; it’s averaging around 47%. And here are all of the different worlds and groups running on this ESXi host. So a group is simply a collection of processes. Right here we see GID; those are group IDs, and then I can see the number of worlds; that’s the NWLD. So how many worlds are in this group? How many threads of execution are associated with this particular group? That’s the number of worlds. And we’ve got all kinds of processes running here that are internal to this ESXi host. So what I’m going to do is just type a capital V, and look what it does: it removes all of those other processes and just shows me my virtual machines. So up at the top, I’ve got Mem Hug Me Hog, who has seven worlds, and then I’ve got all my Perf Worker VMs. They have nine worlds each. The number of worlds depends on how many virtual CPUs the VM has. So I’ve got seven from MHOG. I’ve also got other worlds, not just the virtual CPUs, like the monitor, keyboard, and screen.
So it’s basically all of the processes that make up a virtual machine. The virtual machine monitor is one of those worlds. So what I really want to focus on here on the CPUESX Top is what is going on from a CPUReady percentage standpoint. You can see this “percent ready” column here to the right, and “percent ready” is below 1% for every VM. That means I’m not really running into CPU contention on this host. So it’s very easy to monitor the CPU readiness here. I don’t need to do any math; I don’t need my calculator. I can simply see it presented as a percentage. And if it’s below 5%, I’m not really worried. On the far right, I can see COStop. Again, most of these VMs are single-processor VMs, so I would expect COStop to be zero. But if I have CPU skew, if I have one processor that’s making progress faster than another, I’ll see some values here with Coastap. Let’s take a look at the memory output for ESX Top. So I’m just going to hit M, and that’ll bring me to the memory screen. And then again, I’m going to hit a capital V so that I just see my virtual machines. And now I can see all sorts of good information about my VM. So let’s start with memhog. I can see the memory size.
How much memory has this VM been configured for, and how much memory is it actually granted? And I can see if there is any swapping activity (swapping current and swapping target); how much swapping activity is currently occurring on this VM? And if I see a swap target that’s higher, that means that swapping is increasing, but currently it’s zero. And you can hit F here to change the fields that are displayed. So if I hit F, maybe I want to see certain information, and there’s certain information that I don’t want to see. So I don’t really care about swapping anymore. So I’m going to hit K to remove that and L to remove swap. I’m going to O to get rid of memory overhead. And what I really want to see are statistics related to ballooning. So I want MCTL. The VM M CTL driver is the process that runs ballooning in your virtual machines. And so I want to look at MCTL. Those are the statistics that I’m really concerned with here. Is ballooning activity happening on these virtual machines? And so let’s take a look at what we see. MCTL question mark that’s basically asking, “Is ballooning enabled or disabled?” If you have VMware tools installed on your virtual machine, you’re probably going to see a yes here. If VMware Tools is not installed on your virtual machine, a no will appear. So the balloon driver is baked right into VMware Tools, and you want that balloon driver running. That’s how the ESXi host reclaims memory that’s no longer being used.
So ideally, all of our VMs should say yes, unless there’s a really good reason to not have VMware tools on those virtual machines. And the MCTL size—that’s current ballooning activity—and the MCTL target—that’s how much polluting activity is going to be happening in the future. So if the target is higher than the MCTL size, you know that ballooning activity is increasing. Let’s press D to go to disk. And here you can see all the storage adapters for this ESXi host. So I’ve got Vmhba zero, Vmhba one, Vmhb 64, and Vmhb 65. These are the physical storage adapters for the host itself. And what we can see here are some critical performance indicators. What is the device’s average latency? That means, what is the latency of the storage system itself? could be Guzzi, could be Fiber Channel, could be whatever. How much latency are we seeing from that versus kernel latency? How much latency is the ESXi host itself introducing? And the kernel latency should always be below one millisecond. The device has an average latency. You may see that at eight or nine milliseconds. and that’s pretty normal.
Kernel average latency should be really low, below one millisecond. And if you combine those two together, you get your guest average latency, which is the total latency being experienced by your virtual machines. If I hit s, I can change the delay. So remember when I said we had real-time performance charts inside of your Visa Web client? Well, I can change the seconds to display here, so I could hit five. That’s going to change how frequently this display is actually refreshed. I can put in two. And now my display is actually going to refresh every 2 seconds. So you can actually modify those different values. Here. I can press H to request assistance. So now I can add or remove fields or set the delay in seconds. I can switch my display and do some of the things you’ve seen me do. So far, we’ve examined the CPU memory and the disc adapter. Let’s take a moment and go to disc VM, which is a lowercase V. And here we can see each of the individual virtual machines, how many reads and writes per second they’re creating, and what their latency is for reads and writes. So as you can see, ESX is very powerful. I’m just going to hit N as I talk here. ESX Top is very powerful because it gives you a whole lot of information in one place. You can see it here.
I’m seeing network activity for all of my virtual machines, all of my VM kernel ports, and all of my physical adapters on this particular ESXi host. So the VM is next, and those are my physical adapters. I can see my VM kernel ports, and I can see which physical adapter they’re using. I can see my virtual machine down at the bottom and what physical adapter they’re using. And I can see packets transmitted and megabytes of transmitted packets received. I can change these fields and just simply focus on dropped receives and dropped transmits, if that’s what I’m concerned about. So I’m just going to unselect a bunch of these counters so that I can see those dropped, received, and dropped transmits, and I have zero dropped, received, and dropped transmits, so I’m in pretty good shape. So that’s a quick summary of ESXtop and how it can be used to speed up your troubleshooting process and how you can see a whole lot of information and a whole lot of metrics very quickly by using ESXtop to analyse the performance of an ESXi host.