66. Decision Factor – Memcached vs Redis
Hey everyone, and welcome back to the Knowledge Pool video series. Now, in the past few lectures, we had a great introduction as well as the implementation section related to Memcache and Redis. Now, in today’s lecture, we’ll be understanding the difference between Memcache and Redis and in which situations we should be using Memcache and in which situations we should be using Redis. Understanding this is critical for the exam because you may be presented with scenarios in which you must choose whether to use Memcache or Redis.
So let’s get started. Now, I’m sure you might be a bit amazed by the things that we have on this slide, but they are very simple, most of which we have already discussed. So this is a very nice comparison chart that I have prepared between memory cache and trades. So the first point to make is that simple cache offloading can reduce database load time and latency. Now, since both of them are in the memory caching system, both of them will help to offload the load to the database and lower the latency time. So in this aspect, both can be available in this aspect. Now, when you talk about advanced data types such as hash lists and sorted sets in Memcache, they are not available; they are available in Redis. When you talk about multithreading, multithreading is supported in both Memcache and Redis.
So let me just show you the multithreaded point because modern processors are quite fast nowadays, and if an application is single threaded, it can be both single threaded and multithreaded. So, even if you have a fast CPU and a single-threaded application, the application may not be able to fully utilise your faster CPU type. As a result, the majority of modern applications are built on multithreaded architecture. So let me show you. So I have my species, which is up and running. So, as you can see, I have seven Intel Cores and four cores. So this is a quad-core processor, and the number of threads that are available is eight. So a good application, a good multi-threaded application, will be able to use all the threads to basically reduce the time it takes to execute and complete the operation. So this is what multithreading is all about. So let’s go back. So both Memcached and Redis are multi-threaded.
Now, the next point is the ability to sort data for use in cases of leadership boards and ranking. So this is something that we already discussed when we were talking about sorted sets. So Memcache D does not support it; it is redissupported. Remember these two words: leaderboards and ranking? So in exams, understanding these two words is very important. Next are publisher subscriber capabilities, which are again present in Redis but not supported in Memcache. One critical capability for VCP is high availability and failure rate, which is essentially Multi-Ac. So Memcache D cannot perform multiaz, redis, or mulitas data persistence if it is not in Mem cache. Yes, in redis again, the maximum key value size is one MB, and you have 512 MB in redis. Backup and restore capabilities are not available in memcache, but are available in redis.
So we discussed putting the backups on S3, and we can also import those backups into Redis in a different region. So all those backup and restore capabilities are supported for Redis. One last point is the ability to scale horizontally. We can do this with memcache, so we can scale horizontally, and specific data can be put into a specific node in memcache. So, when it comes to scaling horizontally, Memcache can do it, but Radish cannot. So if you have a lot of data and you want to scale out, you can scale out with memcache.
So, in exams, if you come across a use case that requires horizontal scaling, your answer should be memcached rather than redis. So important exam tips: when should you use Memcached? When should you choose Redis? So we should choose Memcache when we want the simplest model possible; we want the ability to scale out by adding or removing nodes; this is possible because Memcache supports horizontal scaling; and third, we want to shard the data across multiple nodes; so this is the time when we should be using Memcache. Now that we know what we should be using when we want data persistence, we need to sort or rank memory data sets. So this is similar to what we have discussed in terms of “leaderboards” or “sorted sets.” We want to replicate data from the primary to one or more read replicas for high availability, so you need automated failure functionality, which is basically multi-step replication. You want backup and restore capabilities, which can be part of a single region or a multi-region system as well. So these are the few points that you need to remember. Now, I’m very sure all of these are quite easy if you have done the practicals that we have been looking forward to.
67. Understanding Business Intelligence & Data Warehouse
Hey everyone, and welcome to the Knowledge Full video series. Now, in today’s lecture, we will be speaking primarily about the data warehouse, but in order to understand the need for the data warehouse, we also have to understand business intelligence.
So let’s go ahead and understand both of them with a simple use case. So, when it comes to business intelligence, it consists primarily of data technology, analytics, and human intelligence to provide insights that lead to a more successful business outcome. Now, from this definition, it is not much more clear what exactly I mean. So let’s understand this with a use case where we have to understand the users who are visiting the website. So this is basically for business purposes. So, let’s take a look at the points that will be relevant in this use case. First, I want to see the number of page views on my course. So since I, as an instructor, record courses and post them online, what are the things that I will need in order to better understand my audience? So, first and foremost, I’d like to see the number of page views on my courses, primarily to determine which course is rated higher than the others in terms of page views.
The second is to determine the session based on the country of origin. This is also very important, because I need to know from which country the people are visiting—are they visiting from the US, are they visiting from Europe, are they visiting from India? And this actually helps in marketing as well. So whenever you do Facebook ad marketing, you can market based on countries as well. So if I know that, from the second point, most of the users who are coming to my course are from, let’s assume, the US, then what I can do is, whenever I do my Facebook marketing, I would be marketing it to the largest amount of users based on the US origin. So that is the second point. The third point is to determine the session based on the device from which the user is accessing the site. So whether they are visiting from laptops, mobile devices, or tablets, this is also a very important factor because if they are visiting from, let’s assume, mobile devices, If most of my users are visiting from mobile devices, then the contents that I will be putting on the course should be compatible with the mobile devices as well.
It’s not like if I record a video, it can only play very well on laptop devices; it should play very well on all kinds of devices, maybe laptop, tablet, and mobile. So these specific points also help a lot. The fourth point is where the traffic is coming from. So this does not mean the country; this basically means the website. Is the traffic directly coming from Google, or are people directly opening the website, or is it coming from Bing, or is it coming from email marketing, or is it coming from some articles that are part of Cora or some other forum? So this is also very important because, let’s assume, most of the traffic is coming from email. That means that I have to continue, and I have to increase the email presence among all the users as far as the marketing is concerned. So this is a very important point. The next important point is the session duration, which is also important if someone is coming for how long he’s watching the video or for how long he is on the website. The last important point that also helps is the age and gender of the users who are visiting. So these are some of the examples of important pointers when someone wants to, or when a business wants to, understand the users who are visiting the website. Now it has become more of a theory. Let me actually show you what exactly that would look like.
So I’m in Google Analytics, and I’ve downloaded the AWS Security specialty videocourse that we launched six to seven months ago. As a result, the number of users has increased by 30%. It shows the session, it shows the bounce rate, and it also shows the session duration, like how much time the user spent on that page, which is on average six minutes as the average. Now, within the traffic channel, it actually shows you a good amount of information, like if you see some of them are direct, some of them are coming from email marketing, some of them are also coming from other services, it can be some kind of forum, et cetera. So, for example, there are certain forums, such as Cora, where you have degrees and dev mates from which users visit my courses. Now it also shows you a lot of information, like the countries from which the users are visiting from. Most of the US is what you see then, and in India, it tells about the desktop, mobile, or tablet, and it actually gives information on a weekly basis as well. So this is a good amount of information if you consider that this is essentially what the business would require. So basically, Google Analytics is a great platform that gives you a lot of information related to the users who visit the website.
It also shows you some interesting information related to the age, possible gender, and possible interests of the users who are visiting your website. So Google Analytics is a great platform to look into the nifty little details related to the users who are visiting. So this is what business intelligence is all about. So once I know the answers to all of this, I can maybe optimise my marketing in much more effective ways so that I can send my courses or advertise my courses to a large audience that might like the courses and maybe purchase them as well. This is how it would typically appear in any organization. So this entire part comprises business intelligence. Now, in short, if you will see, business intelligence is the act of transforming raw data into useful information for the purpose of business analytics. So with Google Analytics, it’s not like I get the graphs directly. Google Analytics will capture some kind of raw data, which it will then make and transform into nice little graphs.
And based on that, there are a lot of analytics that are performed, like how many users are visiting and how many sessions there are there. So this is the analysis that is performed on the data that Google has received. That is the essence of business intelligence. So the basic operation of a business intelligence system is based on a data warehouse. The BI systems, which are based on data warehouses, extract the information from various organisational sources. So there can be various sources of information. I have my course on five websites, just like I have my course. And in order to gain proper business analytics, I need to receive that data from all five websites so that I can know how many users are visiting in total. So that’s the first step. The second thing is that the data is then transformed, cleaned, and loaded into a data warehouse. So, once you receive the information, it is possible that the format of the information from one website differs from that of another, such as Udemy or Stack Skills.
So these are big websites. Now, the formats that Udemy and Stack Skills might offer are completely different. And if I want to do analytics, I need to have similar kinds of data. So whenever I receive data, I transform and clean it according to my requirements, and then I load it into a data warehouse. And then that data, which is stored in the data warehouse, is used to perform analytics in a nice graphical manner. The key point is that one of the major advantages of a data warehouse is that data from multiple sources is integrated into a single platform, making it easier for analytics to visualise the data. So this is where I have various systems, like operating system log files, ERP, CRM, and flat files. So all of these systems will send the data to an ETL. So ETL is where the extraction, transformation, and loading take place. As an example, Udemy and Stack Overflow may contain different types of data.
So I have to first extract the data from those websites and transform it according to my needs because I might not need all the data; I just need certain columns or certain rows. So I’ll transform it, and then I’ll load it into a central database known as a data warehouse. And then from this data warehouse, I use certain tools that are great, like tableau, which can query the data warehouse and show you the great graphs that you see over here. It will show you great graphs. So all of this data that you see over here, you’ll see every day that you have this data from. As a result, these data are typically stored in a data warehouse. So this is one very important point to remember. Now, some of you might ask, “What’s the difference between a database and a data warehouse?” One of the major and high-level overviews that I can share is that a data warehouse basically contains data from multiple systems.
It can be an operating system, a database, or flat files as well. Now, you can definitely perform the analytics from the database, which you can always do. So one of the points that we should remember is the difference between a relational database and a data warehouse. So an OLTP relational database is more related to transactional data, whereas a data warehouse is more related to OLAP, which is more related to analytics. So the second-to-last letter is T, and A stands for analytics. T stands for transactional. So relational data basically contains the latest data from the website, while data warehouses contain the historical data. Most likely, the data warehouse does not contain the most recent information. So a relational database is useful for running the business. So if the database goes down, the entire business goes down in most cases, while the data warehouse is more about analysing the business.
The third critical point is that it is typically used for read and write operations. As far as relational databases are concerned, however, data warehouses are more about reading operations because they want to read the data that is in the data warehouse. Last but not least, the number of records typically accessed is limited, such as 10 or even 20. A data warehouse, on the other hand, can have millions of records. So, in organisations where I used to work, we used to have a data warehouse, and the query that the analytics guy put in took like 10 or 12 hours to run because there are millions of rows involved in querying the data. Now, when we were discussing this diagram, there were two important points.
One is the ETL, and the second is the data warehouse. Now, there are great software programmes that are generally used for both of them. So let me just show you that when it comes to ETL and when it comes to data warehouses, I have a nice little diagram related to the data warehouse software that is generally used. So, if you look at the Gartner rankings of the leaders, the leaders are similar to the most widely used data warehouse software products. You have staff, which is at the top, and you also have Amazon Redshift. I have seen most of the organizations, specifically startups, using Redshift because it is quite good enough, is provided by Amazon, and is also quite cheap to work around. So SAP is also quite good, but it suddenly aims for enterprise, which is what I believe, while Redshift is more for maybe small to medium startups, and even larger organisations are using Redshift. Now this is what the data warehouse is all about. Now there are also tools that can perform AT-related functionality. Basically, these are the top three tools, but there are a lot of paid tools that do the same thing in a much better way. So this is a very high-level overview of the data warehouse.
68. Deploying RedShift Cluster
Hey everyone, and welcome back. Now, in the early lecture, we were discussing business intelligence as well as the data warehouse. Now, we also looked into the Gartner quadrant, where SAP and Amazon Redshift were two of the top leaders in the market as far as the data warehouse is concerned.
Now, one of the very great things about redshift is that it is available in AWS, and any normal user can go ahead and create a redshift cluster without much hazard. So currently I am in my redshift dashboard in AWS, and we’ll look into how we can create a redshift cluster as well as some of the functionalities related to redshift. So let’s do one thing. Let’s go ahead and launch a cluster. You must now provide a cluster identifier. So I’ll say KP Labs. Dwh. You have to give the database a name. So I’ll say, “Kplabs DB.” Give them to me, and I’ll simply say “admin.” I’ll put a password on it. Perfect. Remember the port; it is 5439. Now, now comes an important point is that whenever you select a cluster, you have to select a node type. Now, if you know the minimum node type is DC-2 large, you do not have a T-2 micro or an M-1 large. So those instances are not there. The data warehouse is supposed to be used quite heavily, and this is the reason why. The minimum data warehouse node requirement is 15 GB. It starts with 15 GB of RAM and two CPU-based virtual machines.
So this is the minimum that you can go ahead with. Now, one more important point to remember is that you cannot really attach an EBS volume. For example, if I want a DC two-large node type and 1 storage, I can’t have both. The storage is directly connected with the node type that you select. So for DC 2-dot-large, the storage is 160 GB. So if I want more storage, then I have to go with a higher node. Let’s say DC 2 dot 8 x large. And now you see that the memory is 244 GB and the storage is 2.56 terabytes. So depending on the storage type that you want, you also have to work on the node type. This is one important aspect: the cluster type. There can be a single node as well as a multi-node cluster. So this is one important point to remember. We’ll just select a single node for the time being. To avoid the cost, I’ll click on “Continue.” You can now integrate the redshift with the encryption solution on the following page. It can be measured in kilometres or in HSM. If you want to encrypt the database, you can do so with KMS or HSM.
The next point is to select a VPC where you want to launch this data warehouse, whether you want it to be publicly accessible or not. I’ll just select no for the time being, and for the availability zone preference, I’ll just select option A. And, as you can see, the data warehouse does not support Multi Easy, as you can see over here. So the data warehouse node can only be launched within a single availability zone. And this is the reason why you have to select an availability zone. So I’ll click on “Continue,” and next you see the on-demand early rate for the cluster is $0 per point, or $0 per point per node if you have multiple nodes. So this is quite expensive, but it is quite cheap. If you go into the overall data warehouse business, this part is quite cheap. So anyway, if you are in Free Tires and are eligible, you will receive 750 hours of free usage per month for the trial.
So this is quite important. And it is for two large nodes in DC. So if you’re launching these nodes, you’ll be getting free tyre usage. Perfect. So I’ll click on “Launch Cluster,” and it will go ahead and launch the cluster. Now, in order for the cluster to launch, it takes a few minutes. So let’s just wait for two to three minutes for the cluster state to get healthy, and we can resume for two to thSo it took around five minutes for the cluster status to become available and the DB health to become healthy. Now, when you go and open the cluster details, you see the redshift has a specific end point. So this is very similar to the RDS database, and it has an end point and a port that we can connect to. Along with this, there are a number of other features that we will investigate in due course. So let’s do one thing. Let’s switch to the slides. So this is one of the initial times where we are actually doing the practical first and then going with the theoretical part. So let’s switch to the slide, and let’s do a red chip again. So, in short, Red Chip is basically a fully managed petabyte-scale data warehouse used for storing large amounts of data for business intelligence applications. So we understood what business intelligence is all about from the earlier lecture. Now, redshift can be a single node or a multi-node cluster, depending on the instance type.
The storage of the redshift can be determined. We have already seen this. And lastly, an important point is that Redshift clusters are currently supporting a single availability zone-based architecture. You cannot have multiple AZ-based redshift clusters. So there are certain important points to remember when it comes to exams. And most of these pointers are related to backups and snapshots. So, since we’ve established that redshift clusters are the nodes of the redshift cluster and can exist in a single availability zone, let’s do one thing. As a result, this node is a part of the AP southeast one A. Now, question is how to deal with backups. What happens if the cluster goes unhealthy, and there are various ways in which the backups are handled? First, Redshift automatically takes a backup of your nodes.
As a result, the interval time is either 8 hours or if 5 GB of data is saved as new. So after that, the nodes are automatically backed up into three So if a node goes down, then Redshift will automatically replace those failed nodes with the latest data that was backed up. That is an important thing to remember. Along with that, we can also take a snapshot. So this snapshot can either be automated or it can be manual. As you can see, there has already been one automated snapshot. You see, typing is automated, which was performed by the redshift. So this snapshot can be both manual and automated. So if I click on “create snapshot,” I can go ahead and create the snapshot I need. So this snapshot that I’ll be creating will be a manual one. So if I say KP Lab F and Manual, I’ll click on Create, and this time what will happen is For the snapshot type, the type will be manual. So this is one important thing to remember.
The next very important thing to remember as far as redshift is concerned is related to the backup. Now we can configure the snapshot to be a cross-region snapshot. So this is a very important part to remember: if we want to migrate, or if you want to have a DR, then we can actually migrate the snapshots of redshift across regions. So currently I am launching in Singapore, but what happens if the entire Singapore region goes down? So in that case, I can do cross-region snapshots in redshift. So this is one important aspect. The next important aspect is that if I want to restore my data warehouse cluster, then I can restore it or I have to restore it based on the snapshot. So let’s assume that I am doing a cross-region snapshot and that this snapshot is copied to another region. Let’s assume Oregon Now, if I want to launch a new cluster based on that snapshot, I just click on the snapshot, go to Actions, and click on Restore from Snapshot, and it will automatically launch a new data warehouse redshift from this specific snapshot.
So we’ve covered a lot of ground. The last part that I’ll show is that this cluster, based on redshift, cannot be stopped. So similar to the EC2 instance that we talked about, we can stop the EC2 instance, but we cannot stop the redshift cluster. When I click on cluster, the only options are modify, resize, delete, and reboot. So you cannot shut down the Redshift cluster. Finally, there are the reserved nodes. So redshift can also be on demand as well as reserved. So if you want to save costs, you can go ahead and purchase the redshift reserve nodes. If you are sure that you’ll be running Redshift throughout the year, go ahead and purchase the reserve nodes, and that will actually save you quite a lot of money. So these are some of the important points. We’ll just revise it as the PPT as well so that data and redshift can be restored. with the help of redshift snapshots, both automatic and manual. Snapshot options are available. We’ve already looked into this. The next important pointer is that redshift nodes are continuously backed up to level three. In the event of failure, the data is restored and the new nodes are automatically launched. So when I talk about “continuous,” it does not mean “every 1 minute,” but they are backed up every 8 hours or every 5 GB of new data. The next important point is that Redshift can restore the data from the snapshot by launching a new cluster and importing the data from the snapshot.
So what I mean by this is that I have a snapshot from two days ago, and I want that snapshot to be applied to my existing redshift cluster. So that is not possible. So if I want to restore the data from the snapshot, I have to create a new redshift cluster by itself. The final and most important point is that a redshift snapshot can be manually or automatically copied from one region to another. Do not forget that redshift supports demand as well.
69. Overview of Elastic File System (EFS)
Hey everyone, and welcome back. In today’s video, we’ll be discussing the elastic file system. Now, before we go ahead and understand the AWS EFS, let’s look into some of the challenges. Now, a lot of times, what happens is that there are certain applications that require access to shared file storage. So it’s like there are two servers, and they both need access to a shared storage location where your data is stored. Now, in order to achieve that type of solution, what organisations generally do they make use of? So you can consider that to be one single hard disc that is connected to multiple servers at a time, and both of the servers can be able to read from or write to that common hard disc drive. All right? So that can be a very high-level description of NASA. Now, there are certain challenges, specifically for file systems. The first one is high availability.
The second consideration is security. And the third one is scalability. So it’s like, let’s say that you have a NAS of 500 GB. Now, suddenly, you see that the amount of data is increasing, and you want to increase it to, say, one TB or even two TB. That is a challenge. It takes a little time to reconfigure things, so that is scalability. The second is security. So security can depend on who will be able to access your Nash device. So you do have certain configuration options that are available. And the third one is high availability. Now, all of these challenges can be solved with the help of the elastic file system, which is also similar to network attached storage. So, in a nutshell, EFS provides a simple, scalable Elastic File System for Linux, workloads for use with AWS, cloud services, and on-premises resources. Now, the second point really makes it quite interesting. Basically, it says that this EFS is built to scale on demand, to petabytes, without disrupting the applications, growing and shrinking automatically as you add and remove files.
So, for example, after one week, you will be putting one data point. However, in the first five days, you might only have 200 or 300 MB of data. So you don’t really need to provision an EFS with 1; you can put 200 or 300 MB of data initially, and then add 1 MB as needed. So EFS can grow automatically, and it can shrink automatically as you add and remove files. So it appears to expand when new files are added and contract when files are removed from EFS. And again, it is designed to provide massively parallel shared access to thousands of EC instances in two instances. So again, this is a great capability. So, before we go ahead and discuss the architecture, let me give you a quick demo of EFS and how exactly it looks like.So I’m in my EFS console, and you can see that I have an EFS running here. So this is what the EFS looks like. Now it has dollar targets. We’ll be understanding this in greater detail in the coming time. But what I wanted to show you for this demo is that there are two EC2 instances that are available. Now, what I have done is, since EFScan can be mounted across multiple EC2 instances, this is something that I wanted to show you. So there are two EC2 instances, and I have mounted this EFS volume to both of these EC2 instances. So let’s look into what exactly it might look like. So currently I am logged into one of the EC2 instances, and if I quickly do a DFO edge, you will see that this is the EFS file system. And this file system is used; it is showing as 50 MB, and it is mounted under the root EFS mount point.
So, if you quickly navigate to root EFS mount hyphen point and do a LS iPhone LH, you’ll notice that there is only one file. So let’s do one thing. Let’s quickly touch up on testdot TXT over here, okay? And if you do a quick LS on your iPhone S, there is a test dot TXT that is present over here. So this is one easy instance. So if you look at the IP, it is 172 31 45 or 237. And if you compare it here, it would be the second EC-2 instance, which is 172, 314,53, or 237. So now let’s go ahead and login to the first EC2 instance. Here is the instance where the EFS is also mounted. So I’ll quickly log into the first EC2 instance here. Let’s do a pseudo su.And if I do a DFI-H here, you’ll see that the EFS is mounted here. And now, if you go to the root EFS mount point, or if I do LS iPhone LH, you will see the test TXT that we had created earlier. It is present. So I hope, at a high level overview, you have begun to understand what the elastic file system is all about.
Now, again, I don’t really have to worry much about how much data store I can put within this file system because, let’s say you have an EBS volume and the EBS volume is 50 GB. Now you will not be able to add more than 50 GB to that EBS volume. If you want to add it, you will have to resize that EBS volume. And in EFS, you don’t really have to worry about that. If you want to add 1 TB, you can do that. If you want to add ten TB tomorrow, for example, you will be able to do so. You do not really have to worry about the scalability aspect. If you remove that tenTV again tomorrow, the EFS volume will return to normal. So this is the high-level overview of EFS. What we’ll do is conclude the video. Now, I hope you understand what EFS is from this high-level overview. And in the next video, we’ll understand the architecture of EFS at a high level overview, and we’ll be creating our first EFS file system and looking into how we can mount it inside the EC, for instance. So with this, we’ll conclude this video. I hope this video has been informative for you, and I look forward to seeing the next video.
70. AWS EFS – Creating and Mounting EFS
Hey everyone, and welcome back. Now in the earlier video we had an overview about what the AWS EFS is all about. In today’s video, we’ll take a high-level look at the EFS architecture and perform an EFS-based practical. Now, whenever you are configuring EFS, you will have a concept of a mount target. A mount target is essentially a result source that allows your EC2 instance to access your EFS system. So basically, if your EC2 instance wants to access the EFS file system from the VPC, it needs to go through the mount target. So while configuring the EFS, we have to configure the mount target in the availability zone. So the mount target is tied to an availability zone, and each of the mount targets has its own IP address. But in the back end, you have a common DNS name that is associated with your EFS file system. So this is one important thing that you need to understand.
So with this, let’s go ahead and create our EFS file system. So I’m in my AWS management console, and from here we’ll go to services, and you can type EFS. So this is how the EFS console looks like. Let’s go ahead and create our first file system. Now we need to configure the VPC. As a result, we use the default VPChere for testing. And now here it says that you create mount targets. Now, we already discussed that instance. Basically, they connect to the filesystem via the mount target. So you will have to create the mount target. And it is recommended that you create the mount target in each availability zone where your EC2 instance is running. And those two EC instances will be trying to connect to the file system. All right? So for our testing purposes, I’ll just select the mount targets for all the availability zones. One thing to keep in mind is that Mount Target is also associated with a security group.
So this is one important thing that you need to remember. Once you have done that, you can go ahead and press Next. The following options are the throughput mode and the performance mode. Now, ideally for throughput mode, you have both capacity and provision. Now this is quite important because, let’s say, you have an EFS file system. Now, the bandwidth of that EFS file system depends on how much data is stored within the EFS file system. So an EFS file system with 100 MB of data and an EFS file system with 100 GB of data will have different bandwidths. All right? So that is one important part to remember. Let’s say you have a brand new EFS and you suddenly start copying 1 data at a time. In that case, you might have issues with throughput. So it is recommended that if you are copying huge amounts of data, let’s say to a brand new EFS volume, And if you do not want to face certain throughput issues, you can specify the throughput mode as “provision,” and you can specify the provision throughput in numbers over here. We’ll be using the burst overhere for testing purposes; you can also enable encryption.
We’ll leave it as is. Let’s click on “Next.” It will give you a review screen. We are all good with this. Let’s go ahead and create a file system. Great, so our file system is now available, and you can see the status is now available, and now we can basically make use of NFS to mount the file system. So let’s do one thing. Let’s launch two EC2 instances. I’ll launch a brand new EC2 instance. So I’ll configure the number of instances as two. We’ll go ahead and review it, and let me click on “Launch.” I’ll select the key because we will have to log in, and let’s click on launch instances. Great, so the two EC instances are now launched. Let’s call the first one “Demo One” and the second “Demo Two.” Let’s wait for a moment till both of these instances are available so that we can go ahead and mount the EFS inside the EC-2 instance. Great, so our EC2 instance is running. Let’s try and log into the EC2 instance.
So we are logged in to the EC2 instance. Now, before we can mount the EFS within the EC2 instance, we need to make sure that certain packages are installed. One of them is the NFS client. So what I have done is document all the commands that will be performed throughout the demo today. So the first thing that we need to make sure of is that you have a package of NFS hyphen utilities. So let me do a yum install of NFS hyphen utils here. And if you’re using Amazon Linux, the NFShyphen utils package is present by default. The second thing is that you have to create a directory where you will be mounting your EFS. So I’ll just create a new directory. You can have it in any directory where you can mount it. All right, so this is the part you can configure according to what you need. And the third command is actually the mount command. So, if you see that this is of the NFS type and that the NFS version is 4.1, you must specify the mount target DNS. So you’ll be able to get the mount target DNS from here. If you see that you have a DNS name here, As a result, you must copy this DNS name and paste it into your browser.
Along with that, you need to make sure that the mount targets that are present have the proper security group that allows them to access the EC instance. So let’s take one of the IP addresses and go to the network interfaces. Let’s put the IP address here. And if you go to view inbound rules, you see that it is only allowing inbound from a specific security group. So let’s go ahead and edit this. So within the inbound, there are two things that you can either configure: You can either put a subnet, say 172 310 00:16, or you can put the security group of the two instances that will be accessing this specific mount target. So, for the sake of simplicity, I’ll just enter 172310 00:16 and press the save button. Now, before you go ahead and access it, make sure that the mount target state is not creating, because otherwise you will not be able to access it. Let’s quickly refresh here, and all the mount target states are now available. Great. So coming back to a text document, I’ll copy this specific command and let’s enter it in an EC 2. So let me put it in the EC-2 instance. I’ll press Enter, and if you just quickly verify, you are getting an eco status of 0. That means the command has been executed successfully. And if you quickly do a DF, you will see that this is your EFS file system. So let’s quickly go here. I’ll go to the EFS mount point, and let’s do a quick test TXT. All right. Along with that, I just wanted to show you a few more things. Let’s quickly do a DD. I’ll say that the input file equals dev 0. The output file is equal to the text file TXT. BS would be one, and count, let’s say, 100.
All right? So it would typically create a file of around 100 MB. So, if you do a double hyphen SS, you now have a file of 100 MB and a small file with the DXC extension. Great. So this is only one EC2 instance. Now let’s log into the second EC2 instance as well. So I’ll go to demo two. Let me copy the IP address here. Great. So we were logged in here. Now, we already know that the NFS utils package is available by default in Amazon Linux. So we don’t need to do a yum install. The only thing that we’ll do is create a directory, and once the directory is created, we’ll copy the mount command and I’ll paste it over here. Great. So once it is mounted, you can go to the EFS mount point. And now you should see that there are two files that are available. One is a text file (TXT) of size 100 MB, and the second is a test. TXT.
Now, if you quickly do a DFN, you will see that use is still zero. It is not showing. However, we already had a 100 MB file. Now the same thing will go over here. Let me quickly refresh. And here you can see the scale. It only shows as six KB. It is not really showing; it should ideally show around 100 MB. Now, the reason why it is not showing is because it takes a certain amount of time, which is basically defined by a metered hour, for your metered sites to get updated. So it’s not like if you suddenly put a one- or ten-GB file within your EFS, it will not immediately update. It will wait one metre hour for the update to take effect.