1. Lecture: Azure SQL Services
So we have Azure SQL to begin with. This is our relational database as a service. So instead of you having to build a virtual machine and then install SQL Server on top of that machine, on top of Windows, although there are Linux versions available now as well, this is provided to you as a service. It essentially uses the latest version of Microsoft SQL, and you can create new servers or you can actually migrate existing databases from, say, I. a sequel that you might have on premises running on physical servers. You can migrate that into Azure SQL using the Microsoft Data Migration Assistant. Now, there are some key features of the Azures database that you should be aware of. one predictable performance, and it’s measured in database throughput units, also known as DTUs. That’s how you often build them and how their performance is measured for SQL. High compatibility supports existing SQL client applications. The only thing I would say is that there are sometimes incompatibilities, and you’ll learn more about SQL managed instances. Another version of SQL that Microsoft has made available But for the most part, it is highly compatible with existing SQL workloads.
You just might have to tweak a few things; sometimes there are schema edits and things like that that people have to do. and simplified management. I think this is the big thing—not much to manage. Managing SQL Servers on premises has been a difficult and often expensive task in the past, and this greatly simplifies management. Now there are three tiers available. We have basic, standard, and premium. So Basic is for small databases with a single concurrent user. Small DBSs (five DTUs in size), small scale applications, and dev test standards for medium-sized databases that must support multiple concurrent connections are all examples of small DBSs. As a result, it’s an excellent choice for cloud applications, multiple operations workgroups, or web apps, as well as 10 to 100 DTU Premium. These are large databases that support a large number of concurrent connections—groups of people connected simultaneously.
High transactions and large users. We’re in the 100 to 800 DTU here, and these are really mission-critical applications when you’re in this tier. Now, we do have a new option in Azure, which is sequel managed instances. I say no because I didn’t see them specifically called out on the exam, but Microsoft does say that their curriculum is not inclusive in their curriculum. However, Azure SQL managed instances, like Azure SQL, are managed SQL Servers, but they are slightly different. These are basically trying to mimic on-premises or legacy SQL workloads that you might be running in a VM in the cloud, so they have higher compatibility with legacy workloads. So just be aware. People call the sequel “Mi” for short. This might come up now and then now that it’s gotten more attention and a lot of people are moving towards it. There’s a big fan base here because, again, it’s highly compatible, very easy to move into, and works just like the SQL Server that I’m already running my application in now. In addition, there are some managed third-party databases available in Azure.
So for MySQL and PostgreSQL, they are managed options. When we say managed, they’re managed services provided as a service; they’ve got built-in predictability, pay as you go, auto scaling, and all the other things you would expect from a managed service in Azure, just not specifically SQL. This is MySQL and PostgreSQL, both of which are available. In addition, we have some no managed options. So like Windows Azure VMs hosting MySQL installations, Clear DB on Linux We have MySQL as well. non-managed because essentially you’re just buying something from the marketplace that has everything preinstalled, but you have to manage that application, the patching in, the maintenance, etc. You are the underlying database engineer; it’s not managed for you in Azure. So, in a nutshell, those are the Azure SQL type options, with Azure Sequel being the most frequently asked about. It’s definitely a direction Microsoft is pushing people in when it comes to relational databases in the cloud.
2. Demo: Creating an Azure SQL Database
Now that you have a good understanding of Azure SQL, let’s actually take a look at this in the Azure Portal so you can see all these options in full effect. So here I am, logged into Azure with my account, and I’m in an account. There are a few services already here, but for all intents and purposes, this could be any account you log into. And if I go, go ahead straightaway and go to All Services. and I’m going to choose Azure SQL from here. So type in SQL, and you can see I’ve got SQL databases. I’m going to go ahead and select that one. And you can see I don’t have any SQL databases currently in the subscription. I’m going to go ahead and click Add to create a SQL database.
And this will bring up our wizard that we go through just like any other service. First of all, I need to choose my subscription, so I’m going to use my Visual Studio Enterprise subscription. You can also use the subscription that you’ve got and then go ahead and create a new resource group for it. So I’m going to call this AZ SQL for my demonstration here. And that’s just a resource group I’m going to put my resources inside of. And now I need to choose a database name. So we’ll call this line DB. And now I need to choose a server to put the database on.
So I’m going to create one. I don’t have any database service here already, so I go ahead and click Create New, and on the right it asks me for my server name. So, in my example, I’ll call the Skylines Server; it’s online. And then I’m going to go ahead and put in my username and password. Take note that you must copy or keep this username and password safe. So you can use it when you want to log in, but you can if you need to. You can reset it from the sequel server page if you do lose it. But I’m just pointing it out now. So just say Azure user for the moment, and then I’m going to go ahead and give it a password, and it verifies that those match, and that’s okay.
And then choose the region we want to create that server in, which is where our database is going to run on. And I’ll put it in US East, which is over here. And this part is not something we need to COVID right now, but you can allow Azure Services to access servers. So this is what happens, and I’ve encountered this with a client when we needed to allow Power Bi to have access to this; there could be other services that we want to access the server for. This is where you check this box, and this makes it available and basically allows some firewall and routing options on the back end to make sure that the service is available to other Azure services that you might be using, but not something we need to use right now. I’ll go ahead and click OK; that’s going to go ahead and put that information in for that new server. And then you see the option here as well: do I want to use an elastic pool? And we sort of briefly mentioned the EDTUS in the previous lecture, but it’s worth understanding just what elastic pools are.
They’re essentially a simple, low-cost solution for managing and scaling databases with unpredictable usage demands. So the databases in the elastic pool are on a single Azure SQL Database server, and they share a set number of resources at a set price. Elastic Pool is an Azure SQL database that will enable SAS developers to sort and optimize the price for a group of databases within, say, a prescribed budget. And then they can deliver performance while allowing for that elasticity. So think about workloads that are changing due to seasonal demand as a good example where you might want to use an elastic pool, but we’re not going to use one for right now in this demonstration. And you can see here that I can configure my database with the various pricing tier options. and you can see a number of options. We’ve got General Purpose, Hyper Scale, Business Critical Step, and Basic Standard Premium. I can click here as well. And this takes me to that DTU-based pricing. But if I go back to Vcore-based pricing, then I go to general-purpose hyper scale and business-critical pricing, which we covered in the lectures.
And then I’ve got my computer here. I mentioned that serverless is currently in preview. You can still choose servers now if you wish to. That’s only going to give you the Gen 5 option there under General Purpose. But if I go back to Provision, where it’s actually going to provision the server, there aren’t going to be servers. Here is my compute generation of generation 4 or generation 5. Again, remember that Gen 4 has up to 24 V cores, but it’s mapping a physical processor to your virtual core. If we need to go higher up from General Purpose, we have a logical processor mapping rather than the physical core itself in Gen 5. Again, we have hyper-scale and business-critical options available to us as well. And below that, once I’ve chosen, say, provisioned or Gen 5, now I choose how many V cores I want to allocate. So again, I can do 2468 all the way up to 80 on that Gen 5. I can also set the maximum size of my data. So you can basically toggle here until you get the size that you want. But obviously, there is a price involved.
So in your cost summary, you can see Gen Five’s cost per core because I’ve got two VCOs selected on Payne, two Xs of the one 1325. And then I got my cost per gigabyte. I’ve got basically $0.12 per gigabyte there, and that’s multiplied up based on my size. So that’s my total estimated cost per month for the SQL Server that I’m going to create. If I scroll back over to the left, I’ll click Apply, and then this will take us over to the next screen here. We proceed to the additional settings by clicking next. I can now input and use existing data, so I can say no, I don’t have any existing data at this point. I can choose to restore from a backup if I want to, but in my case, I’m actually just going to populate. Microsoft provides the Adventure Works LT sample database here that we can basically use to populate it with some sample data as well, and then there are a number of other features coming out here like advanced data security, which is available as a free trial. This allows for more data classification and vulnerability management. You know, Microsoft is continuing to improve all the security services available, so those are there as well.
Then if I go over to the next screen, I’ve got things like tags that I can apply just like we talked about at various times throughout the AZ 300 and 301 courses, and then finally I can review everything and go ahead and create that database and obviously a database server as well. So I’m going to fast forward here while that completes, okay? And as you can see, the deployment is now complete, so we can go ahead and go over to that resource. And you can see here that I’ve got my overview; this is my database itself; and here is my server name, so I can click this as well. This will take me to the server. So this is the underlying SQL Server, and I can continue to add additional databases and pools of data warehouses on top of this underlying hardware that I’ve provisioned. But if I go back or if I actually scroll down, I can click SQL databases here, I can see elastic pools here, and that will take me back to the databases that I’ve provisioned.
If I click Skylines DB, I can see all the details again of that database itself. I get some monitoring data as well. So I’ve got some Compute visualizations here. If I scroll down, I can see the database storage that’s being used there as well. Now going back to the left-hand side, you can see that I can also go to Configure, which is where I can change the underlying performance options (V, cause, etc.) that I want to use. If I go back to the database itself, I can also go to Geo Replication, which allows me to select a region on the map from target regions to create a secondary database, and you can also do this by adding it to a failover group as well. So all you need to know is that there is a Geo Replication option there that you can use, and some of the scenarios and questions may be on a global scale. Obviously, Cosmos DB is something you need to be aware of as well when that comes into play, but know how you can replicate your SQL databases as well, and Azure SQL is a great option for that.
The last thing we’ll just quickly hit before we terminate this demo here is to look at the server again as well. So if we go back up to the server, you can see here that I’ve got managed backups under Skyline Server, and you can see that I’ve got database backups happening here automatically, and when backups are completed, you will see them here. No databases have been backed up yet, but once they have occurred and are there, I can basically restore them as well. I mentioned failover groups a second ago, and this is where you would configure those as well. And failover groups are essentially a SQLServer feature designed for automated replication, connectivity, and failover testing of your databases. So, with that, we have a really, really effective option for doing site-to-site type recovery for a database, which concludes this demonstration and hopefully gets you up and running with Azure SQL.
3. Lecture: Design Auditing and Caching Strategies
Why do we need to audit? So we look at our sequel database and say, “People are asking us to audit it; why is that?” One is to maintain regulatory compliance. You probably have compliance frameworks that you need to adhere to for data and regulations, and that’s a big reason. The big thing is making sure we know who accesses the data, what data we have, et cetera. It’s also good to understand database activity and gain deeper insights into your data, which can then be used from a business intelligence perspective as well. It now essentially tracks database events and writes them to an audit log.
So someone accesses data, someone deletes data—all of those are basically auditable events that we want to keep track of. So again, we understand the activity, we know what’s going on in the database, and it could also help us spot malicious activity as well. There, it utilizes an OMS Workspace Storage account, Event Hubs, or a combination of those as well. If you want to send the data to multiple different locations, some people take that data, send it off to an event hub, and then send it off to some of the security tools such as Sponge, Q Radar, etc. If we look at SQL database ordering, this is what we can basically do with it. We can retain an audit trail of selected events. You basically define categories of database actions that you want to be audited. You can use it to report on database activities using preconfigured reports and dashboards to get you started, as well as to analyses them.
So this is the third feature of SQL database auditing. You’ve got those reports; we can analyses them and find unusual activity, suspicious events, and trends. So it’s not just about retaining the data; it’s also about reporting on that data, giving you some insights into that data, and then analyzing that data for, again, useful information but also for unusual activity as well. Let’s then look at some key things you need to know. Audit logs are written to append blobs to Azure Blob Storage on your Azure subscription. So, when you create a storage account and add Blob storage, they are written to append Blobs of all storage types. As a result, V1 and V2 are general-purpose accounts, and both V1 and V2 are supported as Blob-only accounts. All storage replication configurations are supported, including locally replicated storage (ZRS) and globally replicated storage (GRS) as well are also supported.
Premium storage is currently not supported, which isn’t a big deal because these don’t typically require high-performance SSD disks. Storage in Venet is also not currently supported. So if you’re using storage endpoints, they are currently not supported. Storage behind a firewall is also not currently supported for SQL Database auditing. I can tell you that this is a common request that some of my customers have. So I wouldn’t be surprised if this doesn’t get updated very shortly as well. So just keep an eye on that. When you’re looking through the study guide, one thing you really need to understand is the difference between server-level and database-level auditing policies. So, as you saw in the Azure SQL demonstration, we can go in and we have configuration that we can do at the database level, and we have configuration that we can do at the server level as well. But if we start with the server level, basically you create your ordered-in policy there, and this then applies to all existing and newly created databases on the server. Now that server-side auditing is enabled, it will always apply to the database, regardless of your database auditing settings.
Now, if we do database-level auditing, this is the policy for the specific database, and you can enable it on both the server and the database, but it does not change the server auditing. So both audits will exist in parallel, and you want to avoid this unless you have a use case for that specific database and want to send it to, say, a separate storage account or need a different retention period. So keep that in mind because ideally what you really want to do is just audit at the server level, or at least apply your policy, I should say, at the server level. And then that will apply to all your databases, both those that are created on that server and those that are already there. But if you have a specific requirement that’s different and you’re like, “OK,” that doesn’t make sense for me to put that across every database; I don’t want all that extra data when really I’ve been told I really need to do something extraordinary on the specific database. Yes, go ahead and do it on that specific database, but keep in mind that it will not stop the order in and login that is happening at the server level. So again, you’ve got two policies running in parallel and storing data in parallel there.But with that, we’ve ordered in policies, and in the subsequent lecture we’ll go through caching, which is the other part and what you need to know.
4. Demo: SQL Long Term Backup Retention
In this demonstration, we’re now going to take a look at long-term backup retention, which is also one of the things that Microsoft indicates in the curriculum that you need to know for the AZ Three or One. So, if we go to the Azure Portal and look at the SQL instance that we created in our previous demonstration, and here I am, the SkyI Lines Server and the SQL Server are both up and running. And again, I can look at my databases that are running on that server if I want to. But what I actually need to do is look at the backups themselves. On the left side of the road. Here I click “manage backups.” And as before, you can see I’ve got my backups here. Backups will take effect, but what I can do is click this one and say configure retention. And now you can see I’ve got my point-in-time resource configuration. That’s my standard backup policy that I want to keep in place for seven to 35 days. However, long-term retention is actually lower here.
So you’ve got long-term retention, which isn’t configured by default. And I can choose for you between weekly, monthly, or yearly backups. And the most important thing to understand here is that backups can last up to ten years. If I go down to yearly backups and I say, “Okay, I want to keep ten years,” you can see my maximum there is ten. And I can choose which weekly backup of the year I would like to retain for that ten-year period. And that gives us a great advantage because traditionally, when you wanted to do something like this, you’d often back up your SQL servers. You would put it on tape, and you’d ship them off to places like SunGard that would sort of keep your backups in a vault, a tape vault, basically, for you. And then, if you ever needed to retrieve that data, you would request those, they would ship them back to you, and you couldn’t recover them. But in this case, everything is completely in the cloud now, all managed for you by Microsoft, and you can get up to ten years of retention, and this is where you configure it. and it’s as simple as a checkbox. And with that, that can include this demonstration.
5. Lecture: Cosmos DB
Let’s now take a look at another database service that’s become very, very popular in Azure, and that is CosmosDB, which you’ve probably heard a lot about already. And well, what is Cosmos DB? Well, first of all, it just helps to understand that this is a globally distributed database service. It supports schema-less data, such as noSQL, as we discussed in the data types. And it’s used to build these highly responsive, always-on applications that deal with constantly changing data. So, when I think of that, I think of Facebook—not to say that it uses Cosmos, but Facebook is a good example of something global. Users all around the world are accessing it and changing data in it, but they probably want to access it more locally rather than everybody, say, coming all over the world and going to the US. A data centre wouldn’t make a lot of sense.
Cosmos is a way for the database to be spread out around the globe, and there are some mechanisms to make sure that rights are consistent and the data eventually gets synced across all those different services. Another example Microsoft gives on the Learn site is this image here. The developer frequently updates the catalogue for an online training course catalog, and that goes into Azure Cosmos DB. Global users and training developers work against that Cosmos DB database, and again, they redistribute it all over the world. If you want to see a more detailed architecture diagram of this, this is from the Cosmos DB documentation site at Microsoft and will help you think about it from the beginning. So we’ve got mobile users, web browsers, and other devices coming in through the Internet. They might hit a traffic manager that tells the user, “Hey, you’re going to go to the West Coast of the United States.” North Europe or Southeast Asia? The good news is that if you look at the services under, they may have a Web tier, a middleware tier, just like regular applications today. They can use loads, balances, etc. for But those tiers then basically talk to Cosmos DB.
So in this case, the Web tier hits the middle tier. The middle tier is what talks to the database. And in each region you see, it just talks to Cosmos DB and thinks it’s in its region. But then, behind the scenes, Cosmos DB takes care of all the replication. And the idea of allowing people to natively implement protocols for no SQL APIs is a huge benefit. DocumentMongoDB, graph APIs, tables, APIs, and so on are good examples. And this really allows the application to continue to remain portable and be vendor-agnostic. And on top of that, everything is automatically partitioned for performance and storage capacity in those various regions. So that’s Cosmos DB in a nutshell, and really you just need to know when to use it and definitely dig into the Cosmos DB documentation on the microsite if you want to know even more about it and get more into the weeds with it. There are a lot of good tutorials there that sort of guide you through it.
6. Lecture: Cosmos DB Consistency Levels
Okay, now that we have a high-level understanding of Cosmos DB, you should understand the consistency levels that come with all of that replication that’s happening across Cosmos DB, potentially across regions for you. And so there are five levels to take into account. The first is a strong person. This is a guaranteed write operation that is only committed and visible on the primary after it has been committed and confirmed by all the replicas. So, this is the strongest consistency model, the most predictable, and the most intuitive programming model that you have there. This means that the readers are guaranteed to see the most recent additions to the database at all times. The result is bounded staleness.
So, bounded staleness consistency is the most frequently chosen by globally distributed applications that are expecting low latency but also want those global order guarantees. So this allows you to configure how stale documents can be within replicas. And staleness refers to the number of versions. A replica document can be behind a primary document. So, how far behind can that document be before it simply needs to be updated to the most recent version available? Then we have a session. This guarantees that all read and write operations are consistent within a user session. This is a very common method that is used there as well. Widely used for a single region as well as globally distributed applications, it provides write latencies and availability and read through print comparable to that of eventual consistency, which we will learn about in a second, but also provides the consistency guarantees that suit the needs of most applications written to operate in the context of the user. So, once again, consider a user; consider Facebook as an example; that’s a user-type of session, and a lot of that data is unique to that user.
So that would be a good example of a session for you. And we have a consistent prefix, so this ensures that changes are red in the order that matches the sequence of the corresponding rights. And then, last but not least, there is the eventual. So this is the loosest consistency model that commits and writes operations against the primary. Immediately replicated transactions are synchronously handled and will eventually be consistent with the primary. So think of it all the way from strong—where it’s guaranteed—all the way to eventual. And, in the end, it’s a tradeoff between performance and data consistency that you’re making. When you select a consistency strategy with a higher consistency level, you ensure that documents in replicas do not lag behind those in primary. recommended for applications that require all replicas to exactly match the primary at any point in time, but it has that negative effect on the right operations. If you go for the weaker consistency level, this ensures the databases operate at peak performance or efficiency.
All apps that require high-performance read operations against a replica may return stale data are recommended. So, going back to that list, there are items ranging from strong to eventual. You just need to choose the appropriate level that makes sense for the task at hand. And in the exam, you’ll likely get some use-case scenarios, most likely asking you to choose between strong and eventual. But there’s a good chance that you hear the word “user” in there. You might want to go for something session-based on a consistency level, so just watch out for the wording in the questions there. Make sure you understand all five.
7. Lecture: Azure Data Lake Store Overview
One of the COVID exam questions asks us to evaluate the security of Azure Data Lake. You might not be familiar with Azure Data Lake. Ultimately, it serves as a data repository for big data analytical workloads. And at a high level, it allows us to ingest data, prepare it by cleansing it, enriching it, doing some annotation against it, store it for present or future analysis, and then analyse it using analytical engines such as Hadoop and Spark.
But how do we go about securing it? And what do we need to know for the exam? Well, there are three main things we need to know about authentication. And we need to know that this is integrated with Azure Active Directory so that we can keep using the same identity store there. We have access control. We’ll cover ACLs in a little bit more detail shortly, and there are encryption options as well available to us if we look at ACLs. This may seem a little bit daunting at first, but look at the folder level first of all. And we have access control lists (ACLs), which determine how a user can access a file or folder. And then we have default ACLs. And these default ACLs are copied to the default ACLs of new child folders. And the default ACLs are also used to construct the access ACL of new children. Now, if you look at the files themselves, they only have access to the ACL. So you’d apply an access ACL to the new file.
But whenever we create a new folder, a default ACL is applied there. Then, if a new file is created within the new child folder, a new access ACL based on the default ACL is automatically applied there. So just think of it like a parent-child relationship. Everything underneath is inherent in the parent default and access ACLs. But if it’s a file, it’s just getting access ACLs; it doesn’t get another set of default ACLs, which it needs to then pass down to subsequent folders that would be created under it. Now, if we go a step further and look at the access control structure, ultimately you have three types of permission available. So we have Read, which is the R in the right-hand column on both of the columns. Here we have right, which is the W, and execute, which is the X. And ultimately, we have our owners, who have permissions applied there. Then we have named users and named groups that we apply permissions to, and then everyone else.
And so every single structure has the same level of permissions applied to it every single time we create an ACL, whether it’s default or access. And if we look at it from an encryption point of view, this is the other great thing: we can choose to manage the key ourselves in Key Vault or we can use an Azure managed key in Key Vault. And ultimately, unencrypted data securely travels through the encrypted channel into the Azure data lake store, where it’s associated with the appropriate key from the key vault. Now, there are some differences in key management as well. So if we look at the Service Managed Keys, data stored there is always encrypted prior to being stored. This is also true for customer-managed keys—the master encryption key for both is stored in Key Vault. Are any encryption keys kept in plain sight outside of Key Vault? The answer is always no to this. Can Key Vault retrieve the key? The answer is no to this as well. who owns the Key Vault instance. Well, this is the big difference. when we use service-managed keys. The Data Lake Store owns the Key Vault Instance. But when we use customer-managed keys, we own the Key Vault Instance, which belongs to our Azure subscription. And we have the ability there to revoke access to the key for the Data Lake Store service. And you can permanently delete the MEK as well.
8. Demo: Implement Security for Azure Data Lake Store
Okay. And in the Azure Portal, the first thing we need to do is go ahead and create a data lake. So let’s go ahead and click “Create a Resource.” Enter data lanes. You’ll see Data Lake Store appear; select that, click Create, and let’s do a few things right away way. Let’s give it a name. So we’ll call this the SL Data Lake Demo. It needs to be at least three characters long. As you can see, let’s create a new resource group. So. It’s called the SL Data Lake. I’m just going to leave it as the East US two default. And as you can see, I have some pricing options. We’re not going to get into that because we don’t really need to know that for the exam. We don’t expect to be Data Lake experts here, but we do need to know about the encryption settings. You’ll see that, by default, encryption is enabled. And you do have that choice that we discussed in the slides, which is to use keys managed by the Data Lake Store or keys managed by your own key vault.
But in this case, we’ll simply use keys managed by the Data Lake Store. Go ahead and hit OK to that, and then go ahead and create that Data Lake Store. And we’ll give that just a few minutes to create and come back when it’s completed. So my data lake store has been created. I can see it on the dashboard. I’m going to go ahead and select the SL Data Lake demo. And right away, we can see a few security options. So if we scroll further down, we can select encryption. And this is where you can just see that this account is encrypted using keys managed by Azure Data Lake Store. That’s what we did when we created it. We also have an option for firewalls. It’s another security setting for you. We can choose to enable the firewall, and we can restrict it to certain IPS. We can give rules, et cetera. And one other thing we can do is, if we go ahead to the overview section at the top, just go ahead and upload an Excel file. I’m going to switch to Excel really quickly. You can see I’ve just got a basic Excel sheet here. It’s just got some basic data in it.
We’re going to save that on our desktop. Go back over to the Azure Portal, hit Data Explorer, and in Data Explorer, I can select upload and select my file. I’m going to go ahead and select the desktop here, data Lake Demo. Add my selected file and that’s already been completed. So I’m going to scroll back to the left. I’m going to go and refresh. We also have our Data Lake Demo Excel file, as you can see. Now, if I want to control access to that file, I can simply select these three dots on the right-hand side and select Access. And this is now where I can use Azure identities and give them specific access to this file. And I could also do it at the folder property level, et cetera. And this is just in terms of instantiating what we described and going through the module we previously went through. And with that, this demo comes to an end. I hope this gives you an idea of the various areas where you can implement security and data. Lake.