Microsoft Azure Solutions Architect Expert AZ-305 Topic: Design for High Availability
December 14, 2022

1. Application Redundancy

So we’ve been talking in this section about business continuity. Now, part of business continuity is being able to recover from a disaster, of course, and we’ve talked about that. But another aspect of business continuity is not even suffering a disaster, not having any downtime. Even when a particular component fails or a particular server in a particular region fails, if your application continues on and end users don’t even notice that there was a problem, you’ve got a highly available application. It’s designed for resilience, and it’s part of a good business continuity strategy. Now, Microsoft has outlined a number of areas of redundancy for this exam, and one of them is called “application redundancy.” Now, when you’re designing your application, this is the great thing about the cloud. You’ve got so many different choices when it comes to services, from computing services and networking services to load balancing services, database services, and storage services. There are multiple types of services for each one. Each of those services is going to have a different availability profile and different pricing. So if you are going to store data, do you store it in an Azure storage account as table storage, or do you store it in a sequel database service? Right? These will have different availability SLAs as well as different availability profiles. One of the key aspects of availability is to avoid having any kind of single point of failure. So if you have an application that has multiple web servers running a load balancer, multiple application servers also running a load balancer, but a single database that isn’t behind all that stuff, if that database goes down, then you have a single point of failure. And you could even say if all of your applications are running in the same region, the east of the United States. region, and all components of your application are running in that region, well, when that region goes down, that’s a single point of failure. So, geographically dispersing yourself as well as ensuring that there is always more than one copy of any server that you control, running and networking and such, So the key to that is, again, redundancy. Within Azure, there are now numerous types of load balancing. And so when you’re doing any kind of redundancy, you’re talking about multiple servers that are controlled by load. And usually, you need some type of load balancer in front of that. So whether your virtual machine skillset has load balancing built in or not, Azure App Services also has load balancing built in. There are public load balancers and private load balancers. There’s a traffic manager, and there’s Azure front door service. And all of these services are available at various levels to provide load balancing operations. Now, this course is more about strategy and choosing the right one. We’re not going to get into how to set it up and all the settings and stuff like that. One thing that load balancing gives you is the concept of automatic scaling. So, in front of a virtual machine skill set, you need a load balancer so that you can easily add additional virtual machines and traffic is automatically routed to those. So whether you’re using app services or VMs for any type of compute operation, you’re going to want to have a load bouncer that gives you scaling. Now, you don’t control the servers or the scaling for some computing operations, such as functions, service fabric functions, logic apps, and so on. As a result, not every service you choose has a scaling option. But make sure that you’re thinking about scaling, and it’s almost a must if you have a high-availability application. Another aspect of availability is that there might be some malicious actors out there who are trying to take you down. So not just trying to protect against hardware failures, Internet failures, or power failures, but denial-of-service attacks where millions of computers coordinate together to try to bring down a single endpoint, well, that’s an attack. and it’s designed to reduce your availability. It’s designed to block out legitimate customers from accessing your services. Microsoft has basic DDoS protection included for free, but there is also advanced DDoS protection. So if you are going to be the likely target of an attack, then you’re going to want to look into paying for some type of DDoS protection. It’s a little pricey, if I’m being honest. But if you are the target of an attack, then that’s the price that you might have to pay. Another thing to consider is that if you are under attack, if you are currently being subjected to a denial-of-service attack, moving your application behind denial-of-service protection is one mitigation strategy for an attacker attack that is currently underway. Making sure that your VMs are distributed equally among different pieces of hardware within a region and within datacenters is also an aspect of availability. Microsoft has a service level agreement for virtual machines, and that does depend on whether you’ve got the standalone machines or are running an availability set, which will distribute them among servers and hardware racks so that no single power failure or single piece of hardware is going to take your virtual machine set down, also among data centers, which is going.

2. High Availability for Essential Components

So we’re continuing to talk about high availability, and in particular, we’re going to look at what is high availability and what resources within Azure require your active participation in order to achieve high availability, and which ones Microsoft Azure actually does the work to create. Ha. So what is high availability? Ability service. Well, if your application continues to run exactly the way you intended it to in a healthy state with no significant downtime, even in the face of hardware failure or an attack, Now, to get high availability, we’ve said this a few times: it requires multiple servers, availability zones, running load balancers, making sure your data and files are replicated, and multiple locations to handle that kind of traffic, essentially. When you examine your application as a whole, you will notice that not every single component, no matter how large or small, requires your attention. For high availability, there is the concept that this part is essential, this is public-facing, and this is what the customers are going to see. And this part, if it was down for an hour, nobody would know, right? Now, this is a hypothetical example; we should always point that out, but that’s what it is. So let’s say you have an application running on the Web, and your end-user customers, who actually pay you money to use this application, login to it. Well, you could argue, and it’s a fair argument to say it requires high availability. Those customers expect it to work 24 hours a day, seven days a week. And if they log in and sometimes it’s working and sometimes it’s not, that would be a major blow to your business. It’s financially disastrous, and the building would be on fire if your application didn’t work every couple of days. So you want those customer-facing apps to have high availability? Like I said, this is my opinion. This is hypothetical. What you would do is implement a messaging synchronisation process within your application so that you can use a highly available service queue for applications to talk to each other. You would use highly available storage. So we discussed SQL Server databases versus table storage. You would choose those high-availability solutions over the ones that you have to manage yourself. Now, on the flip side, if you had a batch job that once a week ran, correlated the data, and uploaded it into a data warehouse, well, does that need to be highly available, or can that job, instead of running at one, run at two? Well, sure, it can. Like internal employees would have to understand if there’s some problem with the server that needs to be rebooted, you can argue that back-end jobs do not need to be highly available. Now, if it’s a payroll job or it’s collecting payments from customers, maybe there’s some argument in there or the admin app, right? So if you’ve got an application that only internal employees can access, And you said, “Listen, Friday at 12:00, I have to reboot the server.” It’s going to be down from 12 to 1230. You send an email to everybody; people can deal with that, right? So that’s not necessarily the same standard of availability.

3. Storage Types for High Availability

So we talked about compute, but what is highly available storage? What storage types should you choose? And what should you avoid if you’re aiming for high availability? So, first of all, we should say that the Azure storage account, the unmanaged storage account, and the managed storage account by default are highly durable. Azure has designed this so that when you send it a file and they give you a confirmation that the file was received, there is a ridiculously high standard for durability. I think it’s like nine nines or eleven nines or something like that. One time I sat down and did this calculation that said if you were to write a file to Azure Storage once every second, it would take something like 7 million years for Azure to have lost one of those files. So, when Azure receives a file, it makes three copies of it within a local region. If it’s globally redundant, there are six copies of this file around the world. Azure is not going to lose your file. It’s pretty durable now. Yeah. One in seven years, maybe 7 million years. As a result, it comes with either local zone or global redundancy. You’re going to have three copies or six copies. Those files are pretty safe. Another way to ensure that you have highly available storage is by using the failover option. If you are deploying your storage in a globally redundant manner, then you can have your primary location and your secondary location. And if something were to ever happen—let’s say your primary location was to be unavailable—you could initiate a failover so that your secondary location became your primary location. And that could be very handy for highly available storage. Now, the further away you need to store files, the higher the chance of some kind of latency or some kind of data loss, right? So if there’s an example of some user writing data into a Blob account and it’s got globally redundant storage and it’s storing that file halfway around the world, the farther the file needs to be stored, the longer the latency. And then if you do the failover in between or if the region becomes unavailable or something like that, those milliseconds end up starting to count a little bit more. Like, here’s an example of downtime: a region that’s unavailable. And so your secondary region might be a few hundred milliseconds out of sync with the primary region again. So you can either initiate the fillover or the failover. This is a manual initiation that’s in preview mode. And so suddenly now you want touse the other region as your primary. There is that little, tiny risk of losing that file when you do a fail over.The other thing that is recommended for high availability is that you do backups, right? So you have your servers and everything that’s essential. You have backup copies of it that can be restored. If you need to move files around, another high-availability solution is the AZ Copy command line. IT can do things behind the scenes. So if you need to move files from one region to another, you can initiate AZ Copy, and it will copy those files. This could be part of a high availability strategy. So if you’ve got your log files in one blob and you want to make sure there’s a backup of that somewhere else, you certainly can.

4. *NEW* Essential High Availability Concepts for Exam

So at this point, I feel like I should reiterate how important it is to understand availability zones, how they improve the resilience of your application, and how they actually work. So we’ve covered this concept before, but reality zones are actually data centers. They are physically distinct locations within the same Azure region. This diagram from Microsoft’s website kind of shows that you have three separate buildings that are physically separated. It is possible that an availability zone consists of more than one building, but there are three availability zones in a region. The reason for this is that when one zone of availability is affected, whether it’s a power outage or an Internet outage, Microsoft has designed their system so that the other datacenters in the same region shouldn’t be affected.

So they’ve tried to make it so that outages are localized to that particular geographical spot in that data center. And other data centres in the same region should not be affected. And so by deploying your application across all three availability zones in a single region, you are actually protecting yourself from what are called “localised failures.” So the question that might come up on this exam is certainly something you absolutely need to understand before you attempt the exam. Or how do you deploy your application across all three zones so that if one or two of those availability zones were to fail, then your application isn’t affected? So think about that in terms of the complex set of services that go into making up your application. You might have a virtual network and a firewall in a database and some virtual machines and some sort of middle tier, et cetera. You’ve got lots of things that go into your application, and you really can’t avoid thinking about how it would be affected if a single zone were to go down, or, in the worst case scenario, two out of three zones were to go down. Now. Not every Azure service has specific availability zone support.

Now, in general, Microsoft has broken up their services into three main categories, which we’ll cover in a second. The other thing to think about is that not every Azure region supports the concept of availability zones. So there are quite a few regions we can see here: the Canada region, Brazil, and some of the US. Regions. But out of the 60-plus regions, we only have about 20 here that support availability zones. So, if this level of redundancy is important to you, you’ll need to consider where you’ll deploy your solutions. So I said there are three main types of services. Zoned services can be found among the thousand or so Azure services that are available. Now, these are like virtual machines where you can specify a specific availability zone that you can deploy to. And if you need to deploy it into two zones or into three zones, you’re responsible for doing that; you have to deploy it from zone one up to zone two. Individually. Zone redundant services are Azure-managed services, such as your storage account, where you can select ZRS storage and that zone redundant. And Azure will keep copies of your data in other zones. And basically, it’s automatically replicated. If a single zone was to go down, then they would handle that.

Finally, we have what you might call global services, but they are services like Azure Active Directory that are not specifically tied to a region. And even if you have to choose a region during deployment, which we’ll talk about in a second, that could just be for where the profile is stored. But the services themselves are managed globally. So to go into this exam, we should know whether services are global, always-available, zoned, or zone redundant. So here’s a list of the global always-available services, and I’ve highlighted a few that you might use commonly in your application. As part of designing your application, you may use DNS services, FrontDoor Services, CDN Traffic Manager, and other similar services. And so, knowing that Microsoft takes care of zone-related issues, you can deploy, for instance, to the front door; it doesn’t make you specify which zone to deploy to, and it takes care of some of that stuff for you. So I’ll highlight Traffic Manager and a couple of these other tools that you can add to your solution so that you don’t have to worry about zone-specific issues. But if you think about DNS as a global service, if you want to deploy an IP address, a public static IP address, then that is a zone-related service.

And if you go under the standard SKU, you can actually choose which zones this IP address gets deployed to. So you can effectively have zone redundancy as part of your IP because, as you know, the IP address is a resource that gets deployed as part of your resource group. And so you want to specifically choose, when you’re deploying your public IP address, which zones to go to if you’re concerned about zone redundancy in this way. Another example is the application gateway. You can deploy those cross zones if you look at version two of the application gateway skew. What this means is that you have a single instance of the application gateway, and it can manage services in specific zones. Again, this is something that has to be done during deployment, and it’s not something that’s automatically given to you. And so again, if you’re concerned with zones going down and needing your application to stay running when we have a zone of problems, then you’re going to choose the application that’s specifically deployed across all three zones. Now here’s an example diagram of a potential solution deployment. We can see here that we have a static virtual IP address that could be cross-zone, as we just saw.

There’s the application gateway on the left. That gateway seems to be driving traffic to a couple of availability zones. And there are AKS solutions in one zone and VM skill set solutions in another zone. There are some on-premises solutions, and there are some Azure App Service solutions. And it is working across all zones in its own unique way. Key Vault is another service that is zone redundant, where Azure manages the zones for you. It stores your keys in a redundant way. And if one of the zones were to fail, Azure would take care of picking up your data from another zone. So this is the key value: zone redundant. Now, one thing to point out is that the virtual machines are zonal. So when you’re deploying a single virtual machine, you’re potentially deciding which availability zone to deploy it to. And if you need it to be cross-zone, you need to deploy one VM into each zone. And even though the virtual machines themselves are redundant, if you combine them into a virtual machine scale set, you can actually define the scale set as being redundant. So once again, you’re deploying your virtual machines through a skill set across all three zones, and the skill set manages scaling up and down across zones.

And that’s something you define when you’re redeploying the virtual machine scale set. However, if you use Azure App Services, there is an App Service Premium service, which is a V-2 or V-3 premium that you can cross zone. So even in the example of app services, you do have to take action to protect your apps from zone failures. And so here’s an example: I said right now that it’s an Arm option only; you can choose zone redundant as a property not available in the portal. Now, because it’s a cross-zone, you do have to have three instances of your app running, of course. And so there is a minimum capacity and, obviously, a minimum charge for doing that. So we go back to that question I posed a couple of minutes ago, which is that you really do have to think about it if you’re going to design an application and you need it to be deployed across all three zones so that if it fails in one or two zones, it doesn’t affect the application. Then you have to start thinking about those solutions: the public IP address cross zone, the app service cross zone for individual VMs or deployed into zones, appService Premium, V2 or V3, et cetera. So, in the next video, we should deploy solutions like this cross-zone, and we can see it in action. 

5. *NEW* DEMO: Deploying an HA Zone-Redundant Solution

So we’re going to start this demo by creating a resource group. This resource group is going to contain all of our zone’s redundant resources. Now, even though the resource group can come from any region and can contain resources from other regions, I’m going to intentionally only create the resource group in the same region as my other resources. So I’m in East US, which, according to Microsoft documentation, supports availability zones. Again, you don’t have to create it there, but I’m going to create it as part of this demo. So we have a resource group. Now, the first thing that we’re going to create here is a scale set. So I’m going to select a virtual machine skill set from the list of resources. Now, we know that a scaleset can be deployed across regions. We’re going to put this in our 305 group. We’re going to give this a name. I’m going to call this my new VM. It is located in the East US region, which has availability zone support. And we’ll see here that we do have the ability to deploy this either by basically allowing Azure to manage it or by specifying that we want this scale set across all three zones. So this is the zone-redundant method of deployment. Now, we’re also given the option of deploying this again with the orchestration, which means how does the scale set decide to do scaling? And we’ll just leave it at uniform scaling.

And when it comes to security, we’ll simply leave the standard of state of security alone. I’ll do some Windows Server 2019 servers. and this is a pretty small server. So one CPU, 3.5 gigabytes We’re going to have to create our Windows credentials. All right, I’m not going to choose Windows Hybrid. We’re not going to add any additional disks. We can use regular SSDs and regular encryption. It’s going to create a brand new virtual network for these virtual machines to be part of, which is fine too. We’re not going to make this part of an existing load balancer. Now this initial instance count is interesting because we’re deploying our skill set across three zones. And of course, for VMs, we’re going to want to have at least three servers, one in each zone. We could, of course, choose less; that is entirely up to us. But choose three manual scaling methods. Whether we choose to scale in and out, have our own automation, or rely on their automation, So we can have anywhere from three to six instances of scaling in and scaling out rules. Do we want to have diagnostic locks? Now, you can see this is the policy for how scaling is balanced. Some of the defaults are built across these zones and fault domains. And then when you’re scaling in, it’s going to delete the highest instance ID, the newest virtual machine method, or the oldest virtual machine method, depending on the age of the VM, and create the new one, the newest or the oldest. We’ll just leave it at that. We’re not going to worry about security or Windows updates.

I’m going to turn off boot diagnostics, and we’re not going to worry about identity or guest OS updates. We can have this concept. Health monitoring is a relatively new feature for scale sets. It’s simply like a load balancer with a health monitor probe pointing to, in this case, a web address (http address port 80). And if it detects one of the machines that’s not acting properly, do we allow automatic repairs where it just kills off one of the VMs and reinstantiates it? So this is relatively new. I’m not going to turn this on for now. It’s not the point of this video. We have hyperscaling situations that require strict even balance across zones. So again, we did set the default scaling to balance across zones. In this case, it will actually force the scaling to be balanced. As it says in the tooltip, it will fail if it’s not possible to be balanced. So let’s not select this. We’re not going to risk scaling failing when it can’t create a VM in the zone that we want. So the concept of spreading has to do with your virtual machines being spread across as many fault domains as possible. That is the concept of maximum spreading.

And with static spreading, it only limits it to exactly five fault domains. And if it can’t find five-fault domains, then scaling fails. So, in the case of availability zones, Microsoft recommends that you set to maximum spreading, and you really do need compelling reasons why you wouldn’t have maximum spreading in order to select the fixed spreading option. All right, we’re going to skip over the tags, and when we create this, we’re going to end up creating a virtual machine skill set that operates across zones. And we’re going to have at least three instances, and we should expect it to be one instance per zone because that’s our balanced scaling option. We’re not forcing it because, in this case, we might end up with a zone that doesn’t contain an instance based on what was created, what was deleted, what was manually shut down, and so on. We won’t touch that. So I’m going to say “create.” So this virtual machine scale set should be balanced across availability zones. And now what we can do is create an application gateway zone, a redundant application gateway, for this. So I go into the resource group,  say “create,” and look for an application gateway. Remember that the skew of the gateway has to be the V-2 SKU. So let’s call this a test gateway. Now I’m going to put this in the same region as the other resources in standard V two.

We can enable scaling, but the important thing here is to enable the zone redundancy, so what will happen here is that if it’s basically having a minimum distance of zero, Azure is going to take care of making sure there are enough instances, and we’ll put this on to our existing virtual network. There does need to be an application gateway, which, of course, needs its own subnet, so we’re going to have to do that. So I wasn’t paying attention when I created the virtual machine scale set. It consumes an entire 16 subnet address space, while the entire virtual network only has 16. As a result, I’ll have to add a lot more address space. So I’m going to save this, go back to the VNet, and add a subnet. I’m going to call this the Azure App Gateway, and it doesn’t need to be typically huge, but let’s just stick with the Slash 24. We’re going to leave that alone, and then we’ll close the gateway and let it choose its own.

Now, it’s not a network gateway; it doesn’t need a specific name, but it does need to exist by itself. Now, remember the IP address. Now there is this redundant zone option. Now you’re creating an “affliction gateway” that’s redundant. So it’s choosing by default for the pool. We could add it without targets or say our virtual machine scale set is the target. We need a routing rule. Let’s just put all traffic on port 80. And the back-end target is our back-end pool. We haven’t got HTTP settings for this affinity. No draining? Nope. There is no path basis because I just created the default HTTP settings. We could use path-based tags instead of the default no tags. Now, it’s going to take seven to ten minutes to create this. However, when we build it, we’ll have a publicly accessible IP address pointing to our virtual machine scale set, and it should support if zone one, zone two, or both fail while zone three remains operational.

We’ve done everything from the public IP application gateway to the virtual machine scale-up for that to be true. So the deployment succeeded. We can quickly check how long it took to get this gateway deployed. Approximately four minutes And what we’re expecting then is that we have a fully zone redundant deployment, redundant IP, a zone redundant gateway, and a zone redundant VM scale set. And in this way, anyone who comes to this application through this IP address should be able to continue to do that, whether zone one or zone two is down, as long as zone three remains, et cetera. So that is a very important distinction, increasing the availability of your applications. Now, one thing that we haven’t done is install a web server on these virtual machines, and so we can’t go to the IP and verify it works. That would be sort of a final step. Again, that’s not the point of this video. So I’m not going to go ahead and turn this into a web server and things like that, but you can certainly do that if you want to demonstrate that this is working the way that we intend. I’m going to delete this resource group now. At the end of the video, I hope that it clarifies how availability zones work and all the different resources that you need to make a zone redundant in order to support that type of deployment.

6. *NEW* High Availability Non-Relational Storage

Alright, the next requirement of the exam is to understand how availability works with non-relational databases. Now, we’re talking about no relational databases within Azure. We are generally talking about these things, such as Cosmos DB, Radis cache, and the Azure Storage Services, which include table storage, blob storage, and even Azure files. And so in this video, we’re going to talk about how high-availability applications are designed using these non-relational data stores. First up, we’ll talk about Cosmos DB. Now, Cosmos DB is the premier database solution for non-relational data, sometimes called “no-sequel” data. And it does have some high-availability features built in. So even without you having to configure anything specific, you’re going to be given some of these higher availability features. One of these features is that Azure Cosmos DB keeps a copy of your data. In fact, it keeps four copies of your data within the zone. So this is similar to an Azure storage account, which we’ll talk about in a second. However, Cosmos DB will have your primary piece of data and will copy it to three other nodes within that data center. And so you’re running Cosmos in a single data center. You know, you’ve got four copies of everything you write to it saved somewhere.

This occurs because if a failure occurs, such as one of the nodes experiencing a hardware or power failure, Microsoft can seamlessly pick up the data from another node. You don’t have to do anything. You won’t necessarily be notified that one of the nodes failed. This is how they keep their services running. So this is what I would call You’ve not enabled availability zones specifically, and you’re running Cosmos DB out of a single region. So Azure provides you with 99.9% availability. When you’re running Cosmos DB in a single region for both reads and writes, the downside to this is that you’re really dependent on that zone being up. And if there is any kind of zone-level failure, if the data center floods, there is a power outage, or there is a loss of internet to the entire building, your application may be down because it is now reliant on that single data center, causing them to be down. makes it really easy to replicate your databases across multiple regions. There’s a cool interface where you can just pick another region, and it will do all the replication for you. This still provides 99.9% availability for rights, but it raises the figure to 99.99% availability. And then, when you’re talking about a multiwrite situation or a multimode situation, you’ve got five digits both for the reads and the writes. So just by adding another region to your solution, you’ve increased your availability. So this protects you from that single zone going down because there’s a backup region, and automatic failover will kick in.

Now let’s talk about availability and zone support. So, like we saw with application gateways and virtual machine scale sets, you can deploy CostSDB in application zone mode. And when you do this, even if you’re running with application zones in a single region, you get that bump from 99.99 to 99.95. So adding availability zones in a single region results in four and a half, nine availability zones. Now you’re protected against the data centre outage because if you’ve got multi-zone deployments and a single data centre goes down, you still have your other two data centers. So you don’t lose data, and you don’t lose availability. Of course, you’re still subject to the regional level outage because you’re only deployed to a single region, even if we bump that up to, say, multiregion. So you’ve got not only the availability zones enabled but also the multiregion writing available. Then you can see you’re still getting 99.95 availability for writes, but you’re actually getting five nines for reads. And in the multiple-write situation, you’re getting five nines.

Now, one thing I’ve noticed across all of these examples, the last three pages, is that you’re not really doing much because the Cosmos DB already has a pretty high level of availability in a single region, 99.99%. So when you’re getting into the 99, 95, or 5-9 situation, you’re really just talking about fractional improvements in your availability. But when you’re running a highly available application and you really do need to protect against these situations of data centres or regions going down, that could make a lot of difference, obviously. So Microsoft recommends availability zone support enabled with multiregional multiple rights. And if your main requirement is high availability for Cosmos DB, then you have to turn on all three of these options in order to get the highest level of availability. Now, there’s a cost to it, of course. So just turning on availability zone support within a single region increases the cost by 25%. So at 125 times the cost of Cosmos DB in that single region, it’s a small price to pay, and you do get the protection against the zone going down. Of course, if you’re running it in two, three, or more regions, then you are increasing your costs further by doubling, tripling, or more.

As a result, each region incrementally increases your cost. Now, one of the other recommendations, if you are running Cosmos DB in a single region, is for a single multi-region database. Then you should enable automatic failover. So you do have the option of controlling the failover, but you should enable automatic failover so that Azure picks up when the first region becomes unavailable and automatically moves everything over to running as the second region is the primary. Now we’ll switch over from talking about CosmosDB to talking about the Azure Redis cache. Now. The Azure Redis cache has also been designed to be highly available. So there is some availability built in. Azure Redis Cache runs on the concept of “nodes,” and nodes are VMs. And so you’ve got dual node options, which means two VMs, and we can see that in a second. So you’ve got sort of these three levels of availability. There’s standard mode, which is 99.9% availability running on two VMs replicated in a single data center. And then you have automatic failover if one of them were to fail. So this basically protects you from node-level failure. A particular rack, a particular server, and a particular power supply zone redundancy increase your availability to two, four, and nine, and it uses multiple nodes across the ability zones with automatic failover.

That’s the 59 support when it comes to genome replication. And now you’re talking about your Redis cache running in multiple regions. And you must now control the failover. There are far too many ramifications of Redis Cash failing over between regions. So they want you to be able to control when that happens. Here’s a diagram of the standard configuration, which is two nodes running in the same data center. So we can see there’s the primary node and the replica node, and it’s automatically replicated. And when we do need to failover, then basically the load balancer takes care of sending traffic to the replica instead of the primary. So the primary does all the work until failure. When it comes to Redis zone replication, you must be on the Premium or Enterprise Tiers, as it does not run on the standard tier. Here’s an example of zone-level replication. You can see there are three availability zones and four nodes.

So in Availability Zone 1, there’s the primary node that’s doing all the work until it fails. And you also have a backup replica in ZoneOne that protects you against node-level failure. And then Zones 2 and 3 exist so that you can handle zone-level failures. Now the highest level here is georeplication, which requires Premium or Enterprise tiers. Now, geo-replication is primarily for disaster recovery. So, like we said, right now it has manual failover. And the reason is, like I said a second ago, there are just too many implications of the fact that Redis cache, which is supposed to be a super fast in-memory cache, doesn’t store data to disc because it runs in a different region from your application. So, if your application is running in East US and your Redis cache is running in West US, you are significantly slowing things down. So they want you to have control over that. When you get into what’s called the EnterpriseGeo application, there are a lot more features. This is currently in preview mode. So this isn’t really covered by the exam, but keep an eye on Enterprise level. Geo replication will be the future of maximum availability across regions, but it is not yet ready for production use.

And lastly, we’ll talk about Azure Storage. And so Azure Storage also has built-in high availability features. when you’re creating a storage account. You choose the redundancy, and two of the options are locally redundant storage and zone redundant storage. They both keep three copies of your data. Locally redundant storage keeps those three copies in the same physical location in the same data center. And zone-redundant storage copies your data into three availability zones in that same region. So here’s a diagram of zone-redundant storage. You can see one copy of your data in each of the three data centers. Now, like the other two solutions, zone redundant storage does not protect you from regional-level outages. If you want to do that, you need to get geo-redundant storage, which is a CRS. And the interesting thing about geo-redundant storage is that GRS is an LRS running in two locations. So it’s local redundant storage in one region and local redundant storage in another. So you don’t get the benefits of zone storage, but you do get the benefits of being across two regions. For geo-zone redundant storage, you get zone redundant storage in the primary region and locally redundant storage in the secondary region.

So it’s not zone-redundant storage in both locations. Now, in the GRS and GZRS, you don’t have read access or any access to that secondary region. So you have to trust that your data is replicated to a secondary region, and you don’t actually get access to it until failure happens. So once you have to do a rollover failover from the primary to the secondary region, then you get access to that data. Here’s a diagram showing GZRS, where you’ve got zone redundant storage in the primary and locally redundant storage in the secondary. Now, if you do need read access to that secondary location, then that’s read access for RAGRS and Ragzrs. So then you get that end point that is read-only. It is important to keep in mind that these updates are not instantaneous. Right? So we’ll have to start involving this thing called last sync time, so you’ll know how old the data is when you’re dealing with a secondary region, which could be a few seconds or a few minutes. There’s no SLA for how quickly it syncs in CRS. According to Microsoft’s website, it is called eventually consistent after 15 minutes. And we know this consistency from Cosmos DB as well. And so the availability goes from 99 to 99 with this read-only option. So fascinating is ninety-nine. All locally and zone redundant options have been made available. That really doesn’t differentiate too much. Even though we know that the availability will be higher, the SLA doesn’t get higher, and only on the read-access GRS and GVRs are we getting the additional fourth nine.With storage, it is important to differentiate between the different services. And so Azure files do not support this read access mode, right?

So, if you store data in a file share, you can have globally redundant storage, but you won’t be able to read that secondary location. And also, even to qualify for globally redundant storage, the file share has to be smaller than five terabytes. So if you have a really big file share, then that can’t be globally replicated by Azure. We’re talking about managed disk, which is starting to creep outside of this thing but is still only local or zone-based. Here’s a diagram showing all of the different storage services we see. General Purpose V is available in all of them. General Purpose V One does not support zones, so not that many people are using it. But if you do find yourself with an exam question talking about Joe Purpose v. 1, that is not a zone feature. Also, premium block blobs are local and zone-only, just as we saw file shares also being local and zone-only except for those below a certain amount. Right? As you can see from the diagram, the more replication you have, the more you are focused on General Purpose V. And that’s a summary of the non-relational options for Azure availability.

7. *NEW* High Availability Relational SQL Database

So in this video, we’re going to talk about high availability when it comes to relational databases within the context of this exam. Relational databases include the SQL Server Engine databases, which are Azure SQL Database, SQL Managed Instance, and SQL Server in a VM. The other types of relational databases, like MySQL and PostgreSQL, are not covered by the exam, even though these are managed services, and something like Azure Synapse Analytics is also not on the exam. So let’s talk about SQL databases. That’s the primary relational database that Microsoft obviously recommends. Now of course you would understand SQL Databaseis designed to be highly available by default. And so if you just accept the standard, you know, defaults, basically when you’re deploying an SQL database, you’re going to get 99.99% uptime. So if you’ve got this, you’re going to get this type of minimum uptime guarantee. They don’t want your SQL database to be down. Here’s a diagram showing a SQL database running in a single region. You can see there’s a cluster of gateways at the top that act as a load balancer. And then behind there is the computer, which is the primary replica. It has the tempdb on local storage, and the data files and the log files are actually stored in a premium storage account, LRS, which is the locally redundant storage.

And so when you’re running your SQL database in this normal configuration, you’re actually interacting with a type of virtual machine behind the scenes. Running this SQL instance and the data files that are being stored in a storage account, you can see here that there are some replica backup nodes, if you will. And if there is a failover that needs to happen, then Azure takes care of this for you. They’re going to basically make the failover zone redundant—essentially a configuration where they failover. You don’t even necessarily need to notice that. And the data files are running on the storage account. So you’re not going to lose any data because it’s just one data source.

Now this is going to have the same challenges that a storage account running in LRS has, which is that it’s susceptible to zone-level outages. So if a single availability zone was to go down and it happened to be where your SQL database is stored, then you’re going to lose access to that data for that period. So even though the compute has redundancy, the storage does not. You’ve got three copies of your data within the single data center, but outside that data center, it’s not being stored. As you can see, the backup files are stored with varying degrees of redundancy. So this is what you’re running when you’re running a SQL database. Basic standard or general purpose? Remember that when you deploy a SQL database, you have the option of using the old basic standard premium model or the new Vcore model, in which you select the CPUs and memory separately.

 Now it’s important to remember that, just like with CosmosDB, you can do geo-replication with databases. SQL Database isn’t designed to be multihomed, where you can write anywhere in the world, but it does support global read access and allows you to have secondary databases elsewhere. It’s also important to understand that geo replication does not increase your SLA. I’m going to pull in the SLA for SQL Database, and we can see that when you’re running SQL Database in the business-critical tier with georeplication, what you’re getting is not increased availability.

You’re getting the recovery point objective of 5 seconds. When you have geo replication, you will lose a maximum of 5 seconds of data. And the recovery time objective is 30 seconds. So it takes you around 30 seconds to be back operational. So you’re not actually gaining availability in percentage terms, but you are gaining recovery, which is an objective that you should keep in mind as well. So this is in preview mode, and it won’t be in the exam zone. redundant options for general-purpose SQL databases And you can see here that we’ve got a similar configuration: we’ve got our cluster of GWS, the compute layer with Tempest B, and the data and log files. But now the data logs are running on zone-redundant storage, and you have your compute failovers across three zones.

And so now you’ve got a zone-redundant configuration running in a single region using zone-redundant storage and actually deploying your nodes into the separate zones. This is in preview mode. Now, next up from the general purpose tier is either the premium tier (basic and standard become premium) or the business critical tier. And this is where you start to see higher availability options. So this is a premium type of configuration for business-critical systems running in a single region. And we can see here, of course, that the control rings are the same, but the way that your data is stored is completely different. The data and the log files are actually stored on the local server for these VMs.

And so now you’re not using storage for your data; you’re actually using local. As a result, there will undoubtedly be some performance gains. But now you have the additional challenge of having that data replicated across multiple replicas. There’s this concept again; this is given to you as an “always on” availability group. We’ll talk about that when we talk about virtual machines. But basically, the failover is going to be setup similar to the way you would do it for SQL Server in a VM, right? The way that the data is replicated between the primary and the replicas means that the backups still remain unchanged in the business-critical or premium options. There is also a zone redundant option for premium business critical. And as you would expect, now we have introduced the concept of availability zones. The replicas are now spread across the three zones, with two replicas in the primary zone, so that node failure is covered in the same zone and zone failure is covered in another zone. At this premium business critical tier, zone redundancy is set up very logically.

If we bring in the SLA, we can see that doing it this way, business-critical or premium in zone redundant, increases your availability to four and a half nines. and that’s what I see here. Now, finally, there’s this concept of hyper scale.Now hyperscale is, in my view, a completely separate configuration. What you’re ending up with here is that you’ve got your compute running, but now instead of using data files, you’re using a memory cache. So, in terms of caching, there is a cache element here, and the cache is managed by servers. The data is stored in files in a storage account. But now you have, basically, caching servers that remove the data from memory between the SQL Server and the files in a storage account. So this is like a completely different setup, but this is called hyperscale. And, once again, this is when you start needing really quick reads. So stuff is in memory and not on disk. So that’s why the architecture has to be different. Now, SQL Managed Instance and SQL Database share many concepts. The primary benefit of a SQL-managed instance is two things.

One is the increased compatibility with SQL Server running on premises. So this is the closest amount of compatibility between those two services. And the second thing is that, of course, Azure is managing this. You don’t have to be a SQL Server expert. To run a SQL Server in a VM, you must be skilled in operations, code patching, operating system updates, and backups. You get some of the benefits of SQL databases while maintaining the compatibility of SQL Server running on premises. However, some of these Availability Zones configurations are missing. As a result, no zone redundant configurations exist for a SQL Managed instance. So if you see on a test question about needing to support zone failure, like if two zones were to fail but one zone was still running, you need your database to still be there. A SQL managed instance is not a zone failure solution in a solution. Another thing it doesn’t support is the hyperscale concept, which requires these reads to come from memory cache rather than disk. That’s not going to be a SQL solution to manage instance solution. Finally, as previously stated, SQL Server and a virtual machine are effectively the same thing: you run SQL Server on premises, but instead of a physical server, you run it in a virtual machine. Because we discussed virtual machines in a previous video a few videos ago, virtual machines are zonal.

So you’re deploying a virtual machine if you’re using availability zones, and you’re deploying it into a specific availability zone. This means that if you want zone redundancy, you’ll need to deploy multiple SQL Servers across multiple VMs. Now, as we saw earlier, we were just talking about availability groups. You’re going to have to set up this concept of a Windows Failover Cluster so that Windows itself can detect when it’s not working and failover to another server. Also, availability groups are what will allow SQL Server to manage those requests and transfer them to another SQL Server. Here’s an example of SQL Server running in a virtual machine. In a single region, you’ve got a lot of bouncers and multiple Windows servers.

In this case, the availability group is in charge of data synchronization. In a two-node configuration, there needs to be a third node to decide who’s failing, right? So this type of outsider is referred to as Quorum, correct? When you have an odd number of servers, the majority of servers decide who’s right. And so when you’ve got an even number of servers, you’ll need another server to make a quorum. And so that’s why there’s a witness. It is not a SQL Server, but it decides which of these servers should be the primary in a failover situation. Now, when you’re talking about cross-region support, this is a bit of an odd diagram because you’re not setting up multiple servers in the secondary region. So you’ve got your primary server, and you’ve got your failover server in the same region. You’re witnessing the decision between the two. But if you’ve got a region-level failure, then you’ve got a backup running in another region. But in this diagram, you don’t have two servers with a witness. It’s just a single server. It’s sort of like when our region is down. This is an emergency configuration, I guess. As an example, consider multiple Region Azure SQL Servers and a VM. So, as we saw in this video, you have a lot of the same concepts in non-relational data. with relational data. We’re talking about SQL Server and a VM, or Azure. SQL Database in general. And you do have your own redundant options, but there are a lot of historical Windows failovers and SQL Server availability groups that carry over from the on-premises world into the cloud when you’re dealing with things like that. Bye.

Leave a Reply

How It Works

img
Step 1. Choose Exam
on ExamLabs
Download IT Exams Questions & Answers
img
Step 2. Open Exam with
Avanset Exam Simulator
Press here to download VCE Exam Simulator that simulates real exam environment
img
Step 3. Study
& Pass
IT Exams Anywhere, Anytime!