5. NSX Controller Tables
And the big three tables are the Mactable, the ARP table, and the TEP table. So let’s start at the beginning with the ARP table. The first table shown here is our ARP table. And the ARP table is essentially saying, “Okay, every single virtual machine has a Mac address.” What is the IP address associated with that Mac address? So in order for virtual machines to communicate within the same layer and segment, they need to know each other’s Mac addresses.
The ARP table’s purpose is to track the IP address to the Misaddress, mappings within that layer to segments. Because this ARP table is kept up to date, it can reduce the number of broadcasts. We’ll take a closer look at that in a few minutes. That’s the first table, which is the ARP table. The second table is shown here, and you can see we’re looking at tables for one specific segment. So we have the VNI identifier. We’re looking at the ARP table. The second table is the Mac table. The Mac table’s purpose is to keep track of which tap each virtual machine’s Mac address is reachable through. So, for example, here is the Mac address of a VM. So maybe we are trying to ping some virtual machines. So we type in the command to ping it. We specify the IP address. The ARP table says, “Okay, if you’re trying to get to that IP address, here is the Mac address associated with it.” Great! The ARP table has done its job.
We found that Mac should be the next step. How do we get to that Mac address? Which transport node is it located on? What’s the IP address on the overlay network that we need to forward that traffic to in order to reach the appropriate host? And that’s the purpose of the Mac address table: to basically associate each one of these Macs with a particular VTEP so that we can get the traffic to the correct VTEP. And then the VTEP table itself is tracking the IP addresses of the VTEPs and the Mac addresses of the VTEPs. So on the actual physical underlay network, what is the IPS of the Vteps? What are the Mac addresses of the VTEPS? So these are the three control plane tables that are used to get traffic from one place to another. So let’s take a closer look at these tables and examine how they are built. And the first table we’re going to take a look at is the tap table. And the tap tables track all of the taps that are participating in each VNI. So let’s go back to the last screen and just take a quick look.
Here’s my tip table all the way at the end. Now, if I were looking at an actual live tap table, you would probably see a whole bunch of other IP addresses and Mac addresses listed here. The purpose of the tap table is to list all of the taps that have virtual machines within them that are participating in some specific VNI. And this is important. If you have a bunch of VMs connected to a VNI and you want to broadcast layer 2 traffic to all of them, you need to know all of the tips that are participating in that VNI. So the time table is very important. So here in this diagram, we see two ESXi hosts. Our transport nodes in this diagram are ESXi hosts. And you’ll notice that the taps are in two different subnets. As a result, tap one is in the ten one one subnet. Tap two is in the 170 216 10 subnet. We’ve got our router in between them. And here’s our NSX controller cluster, which, remember, is actually hosted on the NSX manager. So, here’s my virtual machine and VM number one. When that VM is created, when that virtual machine first connects to a segment, the ESXi host is going to update its local tap table, mapping that particular VNI to the tap. So this is the first VM on this host that has connected to this layer-two segment. And now the host is like, “Okay, I’ve got virtual machines on me that are connected to this segment.” I better let the NSX controller cluster know that if broadcasts are coming to this segment, I need a copy of them. So let’s update the NSX controller with a TEP report. So a TEP report is generated and sent to the NSX controller. And so at this point, the tap table of the NSX controller, which only has one host in it, has the tap for this first host. Now, let’s say a virtual machine comes online on this second host, and this virtual machine is also connected to the same layer 2 segment.
Well, now, if there are layer-two broadcasts for this segment, they need to be received by this tech and this one. So this host is going to say, “I now have a virtual machine on this segment too.” Let me send a TEP report to the NSX controller cluster. And now the tap table is being built out to include the tap for every single host that has a VM connected to the segment. So the central control plane is building a comprehensive table of all of the VNIs and which taps are participating in each VNI. And that comprehensive table is going to get distributed to all of the taps participating in that VNI. And this is the same as Nsxv. So the VTEP table in Nsxv was distributed to all the hosts. In Nsxt, the tap table is distributed to all of the hosts. That way, every single transport node knows all of the transport nodes that are part of that VNI. And this is really important because, if you think about it this way, if the MSX controller cluster were to fail—remember, that’s the control plane—we still want traffic to be able to move. Now, let’s say that this VM sends a layer to broadcast.
Well, if this host has a local copy of the tap table, it knows exactly which taps need to receive a copy of that broadcast. So that’s really useful in the event of a control plane failure. And by the way, if you have more VNIS, more layer-two segments, that’s just going to mean additional pep tables. There will be one for each VNI. All right, so next, let’s take a look at the Mac table. So we’ve got a virtual machine. We’ll call it VM One. VM One powers on, and it’s running on this first host here. And so VM One has a Mac address and an IP address. And this is a new virtual machine. So when this virtual machine powers on, the host is going to say, “Hey, NSX controller cluster, here’s a Mac report.” I’ve got this virtual machine. This virtual machine has a Mac address. Mac one. And if you ever need to get traffic to that Mac address, send it to this tap. That’s where this Mac address is reachable from. By the way, it exists on this layer-two segment. The opposite is true for the other host. Virtual machines get created, or maybe virtual machines get V-motioned to a particular host. That host generates a Mac report. This Mac is reachable through this tap on this VNI. So that’s what the Mac tables look like.
So now this Mac table is populated, and it’s got a comprehensive list of all of the Mac addresses in the VNI and which tap each Mac address is reachable through. And what happens next is that a copy of that Mac table is distributed to the logical control plane of all the transport notes, the LCP. And this is different than the way that Nsxv worked. So what’s the big deal with this? Why is this different, and why is this important? Well, let’s think about it this way. If each host has this comprehensive list of all of the Mac addresses for this layer-two segment and which tap they’re reachable through, that can make things much more efficient if there’s a control plane failure. So let’s think about it this way: Let’s say that this table is lost. Let’s say that in NSX Manager, all three nodes are down, and this table has now become inaccessible. And this VM wishes to communicate with this VM via Mac 2. Well, if we don’t know which tapMac 2 is reachable through, then that becomes what we call an unknown unicast. So VM wants to send this traffic to Mac Two. It hits the tap. The tap says, “Well, I don’t know which tap to send it to.” I will send it to every single tap. So if I have 15 hosts that are participating here, all with tips on this VNI, it’s going to get sent to all 15 of those hosts. It’s essentially going to turn into a broadcast. So by distributing a copy of the Mac table to every single host, That means that even if my three NSX manager nodes were to fail, every host has a local copy of the Mac table. So every host knows which Mac is reachable through which tap. So let’s follow a ping through this diagram. So here we go. VM one wants to ping VM two, and let’s assume that the NSX manager nodes are all working properly. Everything is functioning normally. So VM generates this ping. Ping is destined for Mac No. 2. So at that point, a lookup is performed. The Mac table is queried by the tap, and we’ve got an entry in the Mac table for Mac two.
The Mac table says that to reach Mac 2, send the frame to tap ten one one. So now the tap on the source host will encapsulate that frame with the genev headers and send it over the physical network to one-ten, which is the destination tap. That destination tap will decapitate it and forward it on to VM 2. How about the ARP table? So I’m actually going to demonstrate the population of entries in the ARP table in the next video, but let’s just walk through it in theory here. So here, VM 1 starts sending some traffic; let’s say it’s sending some traffic to a virtual machine on some other host. And maybe this is the first time that this VM has sent any traffic. The tap is going to detect the fact that, “Hey, here’s a new IP address that’s sending traffic for the first time.” So the tap will say, “Let me grab the Mac address and the IP address of this virtual machine and create an IP report and send that to the central control plane so that the NSX controller is aware of that IP address to Mac address mapping.” That’s the ARP table.
And as more VMs start sending traffic for the first time, the ARP table will be updated with their IP address to Mac address mappings as well. Okay, so now the ARP table of the NSX controller has been updated with the IP addresses and Mac addresses of VM one and VM two. Let’s say that VM one wants to ping VM two. What’s going to happen? Well, if VM one already knows the Mac address of VM two, then this is pretty straightforward. But let’s assume that VM one does not know the Mac address of VM two. So VM one is attempting to ping VM two, but it doesn’t know the Mac address. It’s going to send an ARQ request, which would normally be a broadcast. But because we now have this ARCH table as part of the NSX control plane, that broadcast can be suppressed, and the IP address to Macaddress mapping can be retrieved from this table without setting a broadcast out onto the segment. So that’s the beauty of the ARP table—it suppresses layer-two broadcasts for ARP requests.
And one more significant difference from Nsxv. A copy of the ARP table is actually distributed to all of the transport nodes. There’s an ARP cache sitting on every single transport node. Again, this is important if the NSX manager nodes fail. Now there’s an ARP cache that lasts for ten minutes. It has a TTL of ten minutes, which sits on those hosts and gives a local ARP tablet. That’s local to the transport node itself. So let’s dig a little bit deeper here. Let’s take a look at what happens when an ARP request occurs. It’s just a little background information if you’re unfamiliar with this process. Here we’ve got two virtual machines. We’ve got VM 1 and VM 5. And since VM1 and VM5 are on the same layer, let’s completely take NSX out of this picture. Let’s assume that there’s no NSX involved whatsoever. So VM one wants to ping VM five, and we’ve got some other virtual machines in this picture. If VM one wants to ping VM five but does not know its Mac address, it sends an ARP request, which is a layer to broadcast. So the physical switch is going to send it to every single virtual machine, and then eventually VM-5 will respond, “Hey, that’s my IP address.” Here is my Mac address. And then VM One will add that Mac address to its ARP table. Now later on, VM 1 may need to ping VM 2 or VM 3 or VM 4, which results in yet another ARP request and yet another broadcast. So that’s what we’re looking to avoid. Here are these repeated broadcasts occurring on the layer 2 network and hitting every single machine on that layer 2 network. So let’s look at this ARP suppression process. And we’re actually going to go back in time a little bit here. We’re going to talk about how this worked with NSX and NSX for VSphere. So a VM wants to ping; let’s call it VM one.
And it’s on a layer-two segment. It’s trying to ping VM 2. In Nsxv, we call those layers of two segments logical switches. So VM one wants to ping VM two. It does not know the Mac, so it issues an ARP request, and the Tap is going to actually intercept that ARP request rather than broadcasting it. It’ll forward that request to the central control plane. And if the NSX controller cluster has an entry, it will respond to that request. And so now the ARP broadcast is completely suppressed, and we’re not sending it to every single virtual machine on that layer-two network. Okay? So now let’s compare that to how the ARP tables work in Nsxt. So, very similar to the diagram we just saw, VM one wants to ping VM two, but it does not know the Mac address of VM two. So VM one sends an ARPrequest to find out what the Mac address is. The ARP request is intercepted by the tap. But in this case, the local host has that result in the local ARP cache. So it doesn’t even need to query the central control plane. It can provide that result right to the virtual machine without any traffic ever leaving that ESXi host. So that’s one of the big differences in the way that ARP suppression works in Nsxt versus Nsxv. All right, so now my ARP request has been completed. VM one now knows the Mac address of VM two, and so it’s trying to send this frame to VM two. It has discovered the destination, Mac. So here comes the frame destined for Mac 2.
A Mac table lookup is performed. The Mac table is going to tell us which tap this frame needs to be sent to on our overlay network. and it’s going to be tap two. So Mac 2 needs to be sent to Ten One One. The source tap will encapsulate that frame and send it over this physical underlay network until it reaches tap two. And in order to get it over the physical network, we may need to find the Mac address of tap 2. And that information is present on the tap table. So in the next video, we’re going to take a look at how to display these tables, and I’m going to show you some of the information included in the Mac, ARP, and VTEP tables.
6. Demo – View NSX Controller Tables with the CLI
In this video, I’ll demonstrate how to use the Nsxt command line to display the ARP, Mac, and VTAP tables for an NSX segment. So here I am, in the free hands-on labs at Hol VMware.com, and I am going to launch Putty. And remember, our NSX controllers are now built into NSX Manager. So I’m going to connect to NSX Manager, I’ll put in my password, and now I’ll be able to access the command-line interface of NSX Manager. And so I’m going to start by making a list of my logical switches. So the command for that is “Get logical switches.” And this will give me a list of all of the logical switches. And the reason I started with this command is because I wanted to find the UUID for one of my logical switches. So let’s go with LS Web. I have a few virtual machines connected to this logical switch. So I’m going to simply copy this UUID.
And as a quick side note, in case you don’t know how to copy text, here is just how: select the text with your mouse and then click on it. And then I’m going to execute the command “get logical.” Switch. I’ll paste in my UUID, followed by Mac Table. This should show me the Mactable for this particular logical switch. So what you can see here is that I have three Mac addresses. Each of those Macs is associated with a certain VTAP, and each of those VTEP IDs is associated with a particular transport node. So essentially, what this is showing me is that I have three different VMs, each with a different Mac address, and it’s telling me which TCP port to use to reach those particular virtual machines. So the Mac table is tracking which Mac addresses are accessible behind which VTAP. Now let’s take a look at the ARP table for this logical switch. And this should have a list of IP address to Mac address mappings. However, you’ll notice there are no entries in the ARP table right now. Why is that? Why are there no entries in the ARP table?
Well, let’s go over to our Vsphere client, and I’m going to go ahead and launch a console on Web One A. And what we’re going to do is, from Web One A, try to ping Web Two A. So the address is 172-16-1012. So let’s go ahead and launch a Web console here. And from web one, I’m going to attempt to ping web two. And there we go. So that worked. And I’m hoping that now, at this point, if I go back to my command line, I’ll have a new entry in the ARP table. So the ARP table doesn’t actually get updated until there’s relevant traffic for those IP addresses. So now you can see that Web One and Web Two’s IP addresses are reflected here. Their IP addresses are associated with a certain Mac address as well. And here’s the transport node that reported that IP to Mac Address Mapping. So there’s my ARP table, and the final table I want to take a look at is the VTEP table. So here we see the command to display the VTEP table for a logical switch. This web segment is just one of our segments, as you can see, and as you can see, we have a number of taps identified here. So we’ve got different IP addresses for different tips.
You can see which transport node each of those taps is associated with. And so this is important for things like broadcast traffic. We need a list of all the tips that a particular segment is sitting behind. So if a broadcast exists for this layer-two segment, that broadcast needs to be replicated to all of these tips so that every virtual machine on that layer-two segment can receive a copy of that broadcast. Now, as a side note, all of the commands that I’m using here can be found in the Nsxt command-line interface reference. But, yeah, if we click on Search and look for tables here real quick, let’s go down to the Logical Switch section here’s.All of the commands that I was just using Get the logical switch ARP table. Obtain both the logical switchmac table and the logical switchv tap. These are the commands that I use to display all the information that you just saw. So this is the command-line reference that I would recommend if you want to learn how to issue additional commands. But the commands I showed you will work just fine. If you’re just trying to take a look at your tables, you
7. BUM Traffic Replication
So BUM traffic is just an acronym for broadcast unknown, unicast, and multicast traffic. And these three types of traffic have something in common. They are all multi-destination flights. So, for example, broadcast traffic, such as an ARP request, And broadcast traffic is sent to everything on a layer 2 segment. An unknown unicast is one in which we essentially do not know where the destination is, which switchport it’s connected to, or maybe, in the case of NSX, which tap a virtual machine resides behind. So an unknown unicast gets flooded to all devices on a layer 2 segment in an attempt to find that destination. And with multicast, we have selective destinations that traffic needs to be sent to. So how do we handle replicating those traffic tunnel endpoints that are participating in a certain VNI? And what of those tunnel endpoints are on different Layer 3 networks? Let’s start by looking at an ARP request. And an ARP request is the most common example of broadcast traffic. So let’s start by simply breaking down this diagram a little bit here.
You can see we have two virtual machines. These virtual machines are both connected to the layer-two segment called App LS. It’s VNI 5001. So we have these two VMs connected to the same segment. Each VM is on an ESXi host. So our transport nodes in this case are ESXi hosts. And each of those ESXi hosts has a tunnel endpoint. So we see a tap on host 110, one 10, and a tap on host 2, which is on a different subnet. And let’s assume that VM one wants to ping VM two, but VM one doesn’t know the IP address of VM two. What happens? Well, an ARP request is issued, and like we’ve seen in previous lessons, an ARP request is a layer to broadcast. So VM one generates this ARP request, and the ARP request hits the tap. Now, let’s just assume that this is the first time that anything has tried to communicate with VM 2. So at this moment, VM 2 does not have an entry in the ARP table. So even if the TEP queries the ARP table of the NSX controller, there’s not going to be anything there. So now this ARP request needs to get broadcast. There could be 50 virtual machines on 25 different ESXi hosts. And so what’s going to happen here? Are we going to send individual copies of that broadcast to every single virtual machine? Not quite.
Basically, what happens is that the source tap is going to take that ARP request and send a single copy of that ARP request to all of the other taps that are participating in this VNI. Remember the tap table that we looked at in the last video? The tap table had a listing of all the taps that were participating in a certain VNI. So this source tab knows exactly which taps need to receive a copy of this broadcast. And it doesn’t really matter that the two taps are on different networks or subnets, because it’s just an IP unicast from the source tap to the destination tab. There are some complexities that we need to investigate further here, but the basic idea is that a layer to broadcast or a layer two unknown unicast is simply distributed to the taps that are part of that layer two segment. And then the taps will flood it to all of the virtual machines on that layer-two segment within that particular ESXi host.
Okay, so there are different replication modes that are possible here, and these are very different from the replication modes that we had with NSXV. So if you’re comfortable with the Nsxv replication modes, bad news: they no longer apply. So here we have these two replication modes. We have two-tier hierarchical replication and head replication. And we’re going to start with head replication and explain how that works. So basically, with head replication, you’re going to forward bump traffic to every tip participant in a particular VNI. And if a host does not have a VMON for that particular VNI, then that host isn’t going to get a copy of that traffic. So basically, the entire burden is on the source tap. So step one is that virtual machine 1 issues an ARP request. The ARP request hits the tap, but there’s not an entry in the ARP table. The source tab is going to send a copy of that request to any taps that are on the same network as it’s on.
As you can see, I have two taps in the 10/1/1 network. It’s going to send a unicast copy to those local tips. It’s also going to send a unicast copy to each of the tips that are on any other network. So basically, the source tap is doing all of the heavy lifting here. It’s generating a unicast and sending that frame to every single tap that’s participating in this particular VNI. So I want to take a moment to look at the NSX design reference that we’ve referred to many times throughout this course. Already here, you see a diagram where they’re breaking down this same concept down.In this diagram, VM 1 over here is generating a broadcast, and you can see the end result in head replication mode, where a copy of that broadcast is being sent to this tap and this tap. and these are within the same rack. So this first hypervisor had to issue two different unicasts, which went to these two different taps. But there are also five other hosts in other racks that need to receive this broadcast.
So it’s going to generate five additional unicasts that are going to flow through the physical network and hit all of these other taps that have virtual machines participating in that VNI. Notice that tap six is the only one not getting a copy. That’s because Tap Six is the only one that doesn’t have any running virtual machines that are connected to it. VNI Okay, so now let’s take a look at the other method available, called hierarchical two-tier replication. And before we even get started here, let’s assume that the two hosts on the left are in one physical rack in my data center, and the two hosts on the right are in a different physical rack in my data center. So the advantage here is that we want to keep as much traffic as possible local to each rack. We don’t want traffic passing from rack to rack unnecessarily, because that involves more networking components. So again, VM One issues some sort of broadcast. Let’s assume it’s an ARP request and there is no entry in our ARP table. So the source tab has a responsibility here.
It will send a copy of that broadcast to any taps in the same local network as it is. So for any taps in the same subnet, this will send a unicast to all of those local taps. But over here on the right, we see a couple of taps that are in a different network, and one of them is called an MTP. The way that two-tier replication works is that the source tap is going to send a copy of that frame to the MTEP in this remote segment, and that MTEP is going to be responsible for replicating locally within that subnet. So again, here we are with the NSX reference design. And I just want to show this in the diagram that they have here as well. Here’s where I got it: tap, tap one. And just like we saw in head replication, the source tap is going to generate a unicast to all of these taps that are in the same subnet that it is in. So the source tap has generated a total of two unicasts so far. It’s also going to generate two more unicasts: one bound for this MTP in this rack, and this MTP will then replicate locally, and then one bound for this MTP in the third rack, which will replicate locally to other taps in that same subnet. And so if we kind of scroll back here and look at head-end replication mode, you’ll notice that the source tab had to generate a total of seven unicasts to send it to all of the taps. And that meant that five unicasts actually had to traverse this spine network.
So we had quite a bit of traffic flowing through the spine network, whereas in the two-tier hierarchical mode, the source tap generated four unicasts instead of seven, and two unicasts flowed through the spine instead of five. So the purpose of this two-tier hierarchical mode is to keep this bum traffic as locally constrained to each of these racks as possible and to reduce the amount of traffic traversing the spine or the amount of traffic flowing from one rack to another. And so this two-tiered hierarchical structure is typically recommended as a best practice. It will usually outperform the head replication mode. But if all of your transport nodes happen to be in the same rack or connected to the same physical switch, the benefits of a two-tier hierarchical mode are largely eliminated in that situation.
8. Demo: Configure BUM Traffic Replication Mode
In this video, I’ll demonstrate how to configure the replication mode for multi-destination traffic in Nsxt 3.0. And so here I am at the NSXT user interface, and once again, I’m using the free labs available at Hol vmware.com. I’m going to click on the networking area, and I’m going to go to segments and under segments. You’ll notice a number of segments that are already built into the lab environment and are generated automatically by my lab kit. And if I expand these segments, like, for example, if I look at the LS app, you can see here that hierarchical two-tier replication has been enabled on this segment. When I look at LSDB, I notice hierarchical two-tier replication. If I look at LS Web, I should see the same configuration.
So my choice of hierarchical two-tier replication is going to determine how this bum traffic—this broadcast unknown, unicast, and multicast traffic—is actually replicated to all the virtual machines on all of the different transport notes. So just to demonstrate how to configure this, let’s add a new segment. I’m just going to call it Rick Demo. I’m going to connect it to my Tier-0 gateway even though I don’t really need to do that. I’m going to pick my overlay transport zone. And I’ll just create a little fictitious subnet here. And then, down here, I can choose my replication mode. I can choose either hierarchical two-tier replication or head-end replication. And yeah, that’s really all there is to it. To choose the replication mode, you just pick it here on a per-segment basis. So I’m just going to cancel that. That’s all I wanted to show you in this video: how to set replication mode on a layer-two segment.