12. Cloud Front Overview
Now we won’t have any labs on the storage gateway. It’s important that you understand this just on a theoretical level. So what is it? Well, AWS Storage Gateway is a service that connects an on-premises software appliance with cloud-based storage to provide seamless and secure integration between an organization’s on-premises IT environment and AWS storage infrastructure. The service enables you to securely store data in the AWS cloud for scalable and cost-effective storage. So what does that actually mean? Well, basically, here’s your data center, and here is the storage gateway.
So basically, it’s just a virtual appliance that you install into a hypervisor running in your data center. And then that virtual appliance will then propagate or asynchronously replicate your information up to AWS, in particular to S3, as well as potentially to Glacier, depending on what storage gateway appliance that you’re using. Okay, so the Storage Gateway software appliance is available for download as a virtual machine image, and you basically install it on a host in your data center. Now it’s supported by VMware’s Six or Microsoft’s Hyper, and then once you’ve installed the gateway and associated it with your AWS account through the activation process, you can use the AWS management console to create the storage gateway option that’s right for you.
So there are now four different types of storage gateways, as opposed to three previously. And it’s still not clear as to whether or not they’ve updated the exam and also changed the names. So the names are similar but also slightly different. So File Gateway is the brand new one, and this is using NFS. Basically, this is where you just store flat files in s three. So it essentially allows you to store things like word files, PDFs, pictures, videos, et cetera. And it’s stored directly on disc three. The next one is volume gateways, and this is using iSCSI. So this is using block-based storage. This is using storage that you would run operating systems on. So it might be a virtual hard disc that you’ve got a VM running on, or it might be a virtual hard disc that you’ve installed SQL Server or my SQL Server on. So it’s not for flat files, although you can have flat files on them. Obviously, you can save a PDF to your virtual hard disk, but with block-based storage, this will not be stored in S3, and I’ll show you a diagram shortly.
So volume gateways are broken down into two different types. You have your stored volumes, and this is where you keep an entire copy of your data set on premises. And then you’ve got your cached volumes, and this is where you’re only storing the most recently accessed data on your on-premises storage. And the rest of the data is backed up on Amazon. And we’ll show you a diagram of that coming up in the next couple of slides.
And then finally, we have Tape Gateway, which uses VTL. And this is basically a backup and archiving solution. It allows you to create virtual tapes and then send them to S3, and then you can use lifecycle policies to send those virtual tapes off to Glacier. Now, the old name for File Gateway is brand new. Volume Gateway is a new sort of heading. And then the old names for these used to be “gateway stored volumes,” “gateway cached volumes,” and “Gateway Virtual Tape Library.” So if you do see Gateway stored volumes, gateway cached volumes, or a Gateway virtual tape library in your exams, they’re really just talking about stored volumes, cached volumes, and tape gateways. So it’s not too hard to remember. So let’s start with the easiest gateway, which is File Gateway. This is also the newest gateway and is most unlikely to appear in your exam, but if it does come up, let me know.
But it’s the easiest one to understand. Files are stored in your S Three bucket, accessed via an NFS mount point, ownership permissions and timestamps are stored in S Three, and user metadata is associated with the file. So things like the file creation time, et cetera, the file type, Once objects are transferred to S-3, they can be managed as native S-3 objects. And you can use bucket policies, so you can do versioning lifecycle management, cross-region replication, et cetera. So basically, this is a way of storing your files in S3 in terms of the architecture. This is what it looks like. So you’ve got your application server. It’s going to connect via NFS to your storage gateway. Remember, your storage gateway is basically just a virtual machine that’s running on premises. It’s running on your ESX servers or on your HyperV servers, and then it’s connecting into AWS, and it can connect in most commonly using the Internet.
But you could also use Direct Connect for those who don’t remember the 10,000-foot overview. Direct Connect is basically a dedicated communication line between your data centre and AWS. We could then connect via VPC as well. What this diagram with the VPC actually says If you remember, VPC stands for “Virtual Private Cloud.” It’s basically a virtual data center. What they’re trying to represent is that your application server and storage gateway don’t have to necessarily be on premises. They can actually be an EC2 instance and a storage gateway sitting inside an Amazon VPC. So you could have this all inside AWS, and then your storage gateway would be sending information to S Three.However, there are other options.
So really, the most common use case for this would be exactly as the diagram shows. You’ve got your customer premises here, you’ve got your application server, you’ve got your storage gateway, and then you’re connecting over the Internet to S3, and then you’re storing your objects in S3, and your objects are going to be flat files. That’s why it’s called a file gateway. So it’s going to be things like Word documents, images, video files, et cetera. Then, depending on the number of days specified in the policy, you can create lifecycle management policies to send them to Infrequently Accessed and then to Glacier. So let’s move on to Volume Gateway. So Volume Gateway is an interface that basically presents your applications with disc volumes using the ICUZZY block protocol. So when you start using Icuzzy, it’s block-based storage.
So this is when you use block-based storage. You can install operating systems on it, as well as applications such as SQL Server, and run databases from it. So basically, think of this as a virtual hard disk. So data written to these virtual hard discs can be asynchronously backed up as point-in-time snapshots of your hard discs and then stored in the cloud as EBS snapshots. In case you’re wondering what EBS is, it’s just elastic block storage. And EBS is a virtual hard disc that we’re going to attach to our EC We’ll create two instances, which are our virtual machines, in the EC II section of the course. So effectively, what we’re saying here is that volume gateways take virtual hard discs that are on-premise and back them up to virtual hard discs that exist within AWS. All snapshots are incremental. So basically, only the changes that were made since the last snapshot are backed up. Storage is also compressed to minimise your storage charges. And there are two different types of volume gateways.
So again, I just want you to imagine them as volume gateways, as virtual hard discs that sit there on Prem, and then you back them up to AWS. And there are two different types. So the first one is storage volumes, and this is where you store an entire copy of your data set locally. And then you asynchronously back up that data to AWS. Essentially, you create a storage volume. It will be a virtual storage volume that you will mount as an IceCube device to your on-premises application servers. So this could be your web servers, this could be your database servers, et cetera. Consider it a virtual hard disk, and any data written to it will be stored on your premises in your own storage hardware.
So it’s going to be stored on your own physical discs or on your own SAN. The storage gateway then replicates this data to S3 in the form of EBS snapshots. So it’s basically taking snapshots of the data and sending them to S3. And remember, those snapshots are incremental. So essentially, the size that you can do for gateway-stored volumes is anywhere from one gig all the way up to 16 terabytes in size. And if you want a diagram as to what it looks like, it’s pretty simple. So over here, we’ve got our users and our clients. They’re talking to our application servers, which might be our web servers. And then the web servers have an Icecast connection. And basically they’re seeing these “virtual volumes,” or virtual hard disks. And then these virtual hard discs are provisioned by the storage gateway. We have an upload buffer.
And all of these virtual hard discs are stored on your own physical infrastructure. So if it’s a virtual hard disc that’s 1 TB in size and it’s not thinly provisioned, it’s going to take up one terabyte’s worth of space on your storage area network or on your physical devices. And then what the storage gateway does is basically take a snapshot. It’s then essentially a flat file, and then it’s going to go into an upload buffer. And that will handle multipart uploads up to three. And it’s just stored as EBS snapshots up in s three.The important thing to remember is that stored volumes are completely yours—you keep a complete copy of your data on site. Consider the volume gateway with stored volumes to be a virtual hard disk, with a complete copy on-site. And then that complete copy is backed up incrementally to volume three, moving on to cached volumes. As a result, cache volumes allow you to use them as your primary data storage while still having frequent access to data locally in your storage gateway. And things get a little complicated here, but the bottom line is that you’re not keeping a complete copy of your data on Prem.
You’re basically only keeping the most recently read data on premise. All the rest of your data is being stored in s three.And because of the way this is designed, you can store volumes of up to 32 terabytes, which is twice the size of previously stored volumes. And the data that you write to these volumes is stored on premise by your gateway in three and recently retained red data. So again, for cache volumes, just think of it as all the data that’s written going up to three. The most recent red data stays on premise. And then it also means that you don’t have to rely on having very large storage arrays because most of your data is being stored on AWS. You’re only keeping the recently read data on site. So here’s a diagram of how it works. So here are our users with our clients; they’re connecting into an application server. It has an Ice Guzzy connection. And basically, here is our cache storage. We have an upload buffer. So when our users are writing data, it’s going to go to our cache storage. It’s then going to be uploaded. It will then be stored as virtual discs within a three. Now, I suspect these are EBS volumes because you can’t obviously have block-based storage on S 3. What happens is that they take snapshots of these discs, and then those snapshots are flat files that are stored on S 3. So I suspect here it’s some kind of EBS storage, but they’re not calling it EBS in the documentation.
And that’s it, basically. As a result, whenever you write data, it is saved in S 3. But every time you read data, the most recently read data is stored locally in your cache storage or in your cache storage, which is attached to your application server. So it sounds a bit complicated. I know a lot of people might be freaking out at this stage. Don’t worry; hang in there till the end. We’ll make it really simple for you. Finally, we just have the volume gateway and the tape gateway. This is basically just used for backups. So it’s using a virtual tape library interface, and it lets you use your existing tape-based backup application infrastructure to store data on virtual tape cartridges that you create on your tape gateway. So NetBackup, Backup exec, VM, and so on all support this. And essentially, you’re just, you know, instead of having physical tapes, you’re now using virtual tapes. These virtual tapes are then delivered to S Three. So here’s a good little diagram of what it looks like. You’ve got all your servers. They’re connecting to your backup servers.
So this could be NetBackup, BackupExec, or B, or whatever software you use to backup your data. They then basically connect to the storage gateway. It’s presented as a virtual appliance. It’s connecting in over iCuzzi, and then you’ve got your virtual tapes. These virtual tapes are then uploaded to SThree, and then, of course, you’ll have lifecycle management policies where you can archive these onto a virtual tape shelf in Glacier. So I know exactly what you’re thinking. All this sounds really, really complicated, especially if you don’t come from a storage background or a backup background, and especially if you haven’t done the ECTwo section of the course and you’re still trying to figure out what an EBS volume is. Don’t let it stress you out. We’ve got the EC II section of the course coming up, and you can always come back and watch this lecture after doing that. We do this lecture in S-3 because this is an S-3 product. And the other thing is, going into the exam, you really only need to know what each gateway is at a high level and what gateway to use in each scenario. So let’s do a quick little recap.
We’ve got a file gateway. This is for flat files only. You’re not storing any files on premise. You’re storing them all on s three.By “flat files,” we just mean Word documents and PDF documents. We’re talking about image files, video files, et cetera. What we’re not talking about are operating systems and databases. For those, we need block-based storage. And so block-based storage uses a volume gateway. So it’s using the ICGUZZY protocol. Volume gateways are made up of two distinct volumes. So we’ve got our stored volumes, and the entire data set is stored on site. And then it’s backed up to the cloud. And then we have cached volumes. And this is where the entire data set is actually stored in the cloud, and only the most frequently accessed data is kept on site. Okay, this is really, really simple. And then finally, we have our Gateway virtual tape library. And this is used for backup, and it’s for popular backup appliances like NetBackup, Backup Exec, Veeam, et cetera. So in the exam, you’re going to get a whole bunch of different scenario questions, and you’re going to be asked to choose between four different gateways.
So you might get a scenario that asks you to store a whole bunch of flat files. You want to keep the storage costs to a minimum. Obviously, you’re going to use a file gateway for that. You might have your own data centre and you’re doing backups of all your servers and using network backup, or you’re using some kind of backup server and you’ve got a physical tape library and the tapes are being stored in Iron Mountain, but you want to go save some money. So how can you virtualize this? Well, then you’d use a gateway virtual tape library and then come down to or go up to volume gateways. So a couple of different scenarios Maybe you’re a busy media company, and you’ve got huge amounts of storage, but you don’t want to keep it on site all the time. For that, you might use cash volumes to back up your data set, or you might be a financial analytics company. And basically, you’ve got to keep latency to a minimum because you’ve got to analyse your data very quickly. You can’t wait for your data to arrive on your servers. As a result, you must keep your entire data set onsite while also backing it up to the cloud. What would you use? Well, in that case, you’d use stored volumes. So those are the types of scenario questions you’re going to get going into the exam. Just remember the four different gateways and the different use cases for them. So that’s it from me, guys. If you have any questions, please let me know. If not, feel free to move on to the next lecture. Thank you.
13. Create a CDN
So to put this all into context, before there was Snowball, there was a service called Import Export Disk. And this meant that AWS would allow you to accelerate moving large amounts of data into and out of the AWS cloud using portable storage devices for transport. So basically, picture this: you’ve got a whole bunch of data, let’s say it’s 500 gigabytes or a terabyte, and you’ve only got a small internet connection, maybe only 1 Mbps or something like that.
So instead of doing it over the Internet, what you could do is send in an external hard disc to Amazon, and then they would basically transfer this data directly onto and off of the storage devices using their high-speed internal network, bypassing the Internet entirely. Now that was great, but the problem was that lots of people started using it, and they were sending in all these different types of disks—all different types of external disks, different types of connections, that sort of thing. And it became basically a nightmare to manage. So at Reinvent, In 2015, Amazon released what’s called “Snowballs.” And there are three different types of snowballs. There’s your standard snowball, and there is Snowball Edge, which was announced at Reinvent in 2016. And then there’s the snowmobile, which was announced at Reinvent 2016 as well. and I’ll go through what each one is. So let’s start with the snowball.
And what does a snowball look like? This is a snowball; these are the bad boys. That’s a snowball off to the left, which is closed, and this is a snowball that is open. So they were quite larger than a briefcase and quite heavy. I have actually ordered one, and I was hoping to get it a few weeks ago, but it still hasn’t arrived. I was hoping to have it for this lecture, but I will record this lecture when it arrives. So what is a snowball? Petabyte-scale data transport solutions, which use those basically secure appliances to help you transfer large amounts of data into and out of AWS, are what it is all about. Now, basically, what Snowball is designed to do is streamline the process of bringing data into AWS and bypass the Internet. So instead of managing all these external discs from third parties, Amazon just gives you an appliance.
You load data into the appliance and then send it to Amazon, and then they export that data from that appliance into S 3. So transferring data with Snowball is simple, fast, and secure, and it can be as little as one fifth the cost of using high-speed Internet. Right now, there are 80 terabytes of snowballs available in all regions, and you can get 50 terabytes of snowballs in the US. Don’t worry, you don’t need to know the size of a snowball going into the exam. Amazon never does those sorts of tests or asks those sorts of exam questions. So they’ll never ask you how many regions are available, how many multiple multi-AZs are available in a particular region, or how big a snowball is. It’s more important to understand what the concepts are. So Snowballs use multiple layers of security. They’re designed to protect your data using tamper-resistant enclosures. You get 256-bit AES encryption on them, and they have an industry-standard trusted platform module that is designed to ensure security and full chain of custody of your data.
And they actually come with Kindles, basically. And you can track where your snowball is at any given time. Once the data transfer job has been processed and verified, AWS performs a software erasure of the Snowball appliance. So you will not be able, or future customers will not be able, to recover the data from that appliance. So let’s move on to what a snowball edge is. And Snowball Edge resembles a regular snowball, except that it contains 100 terabytes of data and includes onboard storage and compute capacity. So the first one is Snowball, which is just onboard storage. With Snowball Edge, it has compute capabilities as well. And basically, Snowball Edge is more or less a little AWS data centre that you can bring on premises. So you can use Snowball Edge to move large amounts of data in and out of AWS, but you can also run lambda functions from them. And what Snowball Edge allows you to do is bring computing capacity to places where you otherwise would not be able to do it. So for example, airline manufacturers and aircraft engine manufacturers can deploy snowball edges onto aeroplanes and collect data around how that aircraft engine is running.
And then when the aeroplane lands, you can take the Snowball Edge out, ship it back to the AWS datacenter, and you’ve got not just your S3 storage, but you’ve got your lambda functions that have collected the data as well and stored it in S three.And we’ll talk a little bit more about what lambda is later on in the course, in the EC II section of the course. But just think of Snowball Edge as an AWS data centre in a box. It’s not just storage capacity; it’s compute capacity as well. So now we come to the final one, Snowmobile. Now, this was announced at Reinvent 2016 in the most dramatic way possible. So one of these bad boys is Snowmobile. It’s a massive sea container on the back of a huge truck. And this is for petabyte- or even exabyte-levels of data. So for those of you who don’t know, obviously you have gigabytes of data. 1024 GB is essentially a terabyte. Terabyte becomes petabyte, and petabyte becomes exabyte. And AWS has been finding now that companies are saying, “We have exabytes worth of data; we’d like to move it into AWS,” but at ten megabits per second, that’s going to take us 25 years.
So using an AWS snowmobile, you can actually do that in less than six months. So Snowmobile currently has 100 petabytes worth of capacity, and you can order ten of these and bring in an exabyte, and it will roughly take you about six months according to the Reinvent Reinvent presentation. So, AWS Snowmobile is an exabyte-scale data transfer service that is used to move extremely large amounts of data into AWS. You can do 100 petabytes per snowmobile. A 45-foot-long ruggedized shipping container pulled by a semi-trailer truck is used to transport it. And then snowmobiles make it easy to move massive volumes of data into the cloud, including video libraries, image repositories, or even a complete data centre migration. So you could imagine your traditional colo datacenter providers starting to panic about this. Transferring data by snowmobile is obviously going to be secure, fast, and cost-effective. Costs are priced on application, by the way. So snowmobiling is truly extreme. I’d imagine you’d only see this in Fortune 500 companies. It’s basically taking the idea of a snowball to its furthest limits. So what do you need to know for your exam? Well, you have to understand what a snowball is. You have to understand the legacy applications or what used to happen. So it was called Import-Export before it was called Snowball.
And import/export is where you send in your own disks. So you may be asked a scenario question, and while snowball is not an option, import and export may be. Sometimes the exams can be a little bit out of date, so just understand what ImportExport is from a historical perspective. Import and export are still available, by the way, but if you go into the AWS console, there’s no service description for them there. If you want to basically import data into AWS, they just force you down the “snowball” route. I don’t blame them. It must have been a nightmare managing all of those different discs coming in from all over the world, let alone understanding what a snowball can do. So you can import into S Three and export from it. If you’re using Glacier, you’re going to have to basically do a restore from Glacier to S Three, and then you’re going to have to move it out of S Three onto your Snowball appliance. So that’s it for this lecture, guys. It is theoretical. Hopefully I will be able to update this in a couple of weeks with a brand new snowball edge. By the way, snowball edges can be clustered as well. I wouldn’t expect snowball edges to be in the exam at all at this stage. They’re still brand new. But hopefully we will have an updated video in the next couple of weeks with the snowball edge on my desk. If you have any questions, please let me know. If not, feel free to move on to the next lecture. Thank you.
14. S3 – Security & Encryption
What exactly is S three? Transfer acceleration. It uses the cloud front edge network to speed up your uploads to S3. So instead of uploading directly to your S Three Bucket, you can actually use a distinct URL to upload directly to an edge location, something that’s closer to you, which will then transfer that file to S Three across the AWS backbone network. So in order to do this, you’re going to get a distinct URL to upload to. So it’s going to be something like a cloud guru. So your bucket name is “Three Hyphen Accelerate.” Amazon.Aws.com So what’s actually going on? Well, to help you visualise it, we’ve got this little diagram. So we’ve got our S-3 bucket, which is hosted outside of the Irish region. So we have the EU, West One, and then our various users all over the world. Now, normally, if they were to try and upload a file to this bucket, it would be done over the Internet or go to that bucket’s region. But with S-3 transfer acceleration, what they can actually do is utilise their local edge locations. When they upload that bucket using that new URL that we saw earlier, they will actually send the file to an edge location. That edge location will then send that file up to the bucket. And Amazon has basically optimised this over its backbone network.
They’ve optimised different protocols and gone through a whole optimization process to make it a lot faster. So let’s take a look at how we actually enable this. Okay, so here I am in the AWS console. I’m going to click on Services and go over to S Three. I might create a new bucket for this. I’m just going to call it “Cloud Guru Transfer Acceleration” or something like that. If I could actually spell acceleration, I might just do Excel or something like that. Go ahead and hit “next.” That should be available, hopefully. Yes, it is. Next, I’m just going to leave everything as is and create my bucket. So I made my bucket and went over to Properties. We’ll be able to see that transfer acceleration is down here. When we click on Transfer Acceleration, all we have to do is click Enabled and then Save. That will enable transfer acceleration for your S-3 bucket. Take note of the new endpoint. As a result, it is using a new Amazon Web Services (AWS) domain name, or subdomain. Accelerate is a three-hyphen word. And so now, when you’re accelerating, you’re basically, essentially, using Cloud Front and the edge locations nearest to you to accelerate your uploads directly to S Three.So it goes over the cloud-front distribution network, and then it routes over Amazon’s internal backbone network back to your S-3 bucket. Now, if you want to compare your data transfer speeds by region, go ahead and click on this link. So then load up this page. Now this page can take five to ten minutes to actually populate, but essentially it’s testing the different areas around the world and comparing a normal direct upload speed to an accelerated transfer upload speed. so you can compare the different ones. And I’ve got another one open, which I’ll just open now.
So here’s one I created earlier. I’ve always wanted to say that. So if we go down, we can see that by turning on transfer acceleration. going to San Francisco. It’s 4% faster. Oregon 3% faster. Dublin 3% faster. Frankfurt, interestingly, is actually slower because I’m based in London right now, and the further away I am, the better it tends to be. As a result, Tokyo has 36%, while Singapore has 34%. 42% sydney 29% sao Paulo 44%: We do have some that are a little bit slower. So we’ve got Ohio, Mumbai, et cetera, etcetera, and it is actually still finishing. So it’s just finished Canada, and now it’s finishing London. So that’s three types of transfer acceleration in a nutshell. To be honest, I don’t actually think it is in either the solutions architect associate or developer associate exams just yet. But if you do see it, let us know. In any case, understanding what it is at a high level is beneficial. So that’s it for this lecture, guys. If you have any questions, please let me know. If not, feel free to move on to the next lecture. Thank you.
15. Storage Gateway
We’ve gsnowball,tandard snjust pure storage, and itustpure storage, can come in at various differentsizes 50 terabytes is the starting size that’s still available in the US. Around the rest of the world right now, it’s 80 terabytes. You would never be asked how big a snowball appliance is or how much storage a snowball appliance can actually store. However, it’s more important to understand what it is at a high level. Snowball Edge has storage as well as compute capability, so you can run lambda functions from it. It’s essentially a miniature version of an AWS data centre in a box, and that’s how they appear. They’re quite heavy and quite large; you don’t want to throw them at your friends, and if you really don’t like your friends, maybe you should. This is a snowmobile.
This is 100 petabytes’ worth of storage. It’s on a 45-foot container driven by a semi-trailer. It can actually come with armoured protection as well. It is currently only available in the United States and parts of the United States. People in Hawaii and Alaska are not yet able to get a snowmobile, and the price is available upon application. I’ve always wondered what the price is. So at a high level, just understand what a snowball is and what import/export is because they could still reference it, especially if a question is out of date. A snowball can basically import into S3, and S3 can export to a snowball. Moving on to S-type transfer acceleration So you can speed up transfers to three using three-way transfer acceleration. It costs extra and has the greatest impact on people who are in a faraway location. And what those people are essentially doing is uploading files to an edge location, which are then written to your S3 bucket. Moving on to three static websites So you can use S3 to host static websites. It is serverless, so you don’t have to worry about EC2 instances or virtual machines. It’s very cheap, and it scales automatically. The only thing you need to keep in mind, though, is that it is static only.
Whereas if it’s just a normal Amazon S-3 bucket, it’s going to be S-3 and then the region, then amazonaws.com, and then the forward slash and the bucket name. So make sure you can identify which is an S3 website and which is just a normal bucket. I’ll give you a big hint. Look for the term “S Three Website” in the URL address. My last few exam tips When you successfully write to S-3, you will receive an HTTP 200 success code. That’s what it takes to write well. You can load files to S3 much faster by enabling multi-part uploads. So it breaks the big files up into multiple pieces, basically uploads those pieces to S3, and then puts it all back together again. And then my final tip is to make sure you read the S-three FAQ before taking the exam because S-three is going to come up a lot in the exam. So that’s it for this section of the course, guys. Go take a break, go do something else, and when you’re ready, come back for the next section of the course. Thanks a lot.