56. Eventual Consistency in Distributed Systems
Hi everyone, and welcome back to the Knowledge Portal video series. So today we are going to talk about one of the important topics as far as distributed systems are concerned, which is basically eventual consistency. Now, in general, consistency is one of the most important aspects of data storage, particularly in the case of distributed data storage. So basically, what consistency says is that whenever you save or change a piece of data, that changed data should be visible to all other entities requesting the data for. So again, the golden rule says that after a transaction is completed, the changes to the transaction should be visible immediately to all the participants. So again, I have mentioned through a simple example that if a change is committed at the first millisecond, that change should be visible to everyone querying at the second millisecond. So let’s understand this with a whiteboard. So allow me to make it full screen. Let’s get this straight because this is a very important subject. So let’s take a simple example of a storage system, something like a media file. So let’s assume that this is a website, something like a media file that stores the user’s files. Now this is an application that is running on the frontend, and let’s assume that this is a storage device. I say this is a storage device. Now, every time the user writes or saves data, I’ll draw a happy user. Okay? So as soon as a user saves some data, the application writes that data to the back-end storage, and the user is granted a success message. So this is a very simple type of architecture.
So, if this is a single transaction, and a user writes some data to the storage device, we can guarantee that the data was received by the storage device and that the application returned a success message to the user. So this results in what I would call an idle consistency. Idle consistency. But ideal consistency is basically easy to achieve if you have a simple storage device. But if your architecture is something called “distributed,” then idle consistency is very difficult to achieve. So let’s take a more complex example, something like, say, a production environment where you need a proper replication over here. So the problem here is that if this storage goes down for some reason, all the data will be deleted. So we really don’t want an idle consistency where the data has a risk of being deleted. So we need more reliability as well. So, in order to make this architecture more reliable, let’s revisit a simple example from earlier. Let’s say this is an application running, and now instead of one storage device, what we have done is have three storage devices over here, okay? So we have three storage devices. What we have here is, say, a queue that we designed. I’ll say Q. So, whenever the application receives some data, say data 10, from the user. Assume this is a text document or a JPEG file. Okay, the application will write that data to the queue. We now have three storage devices. So all three storage devices will pull the data from the queue, and eventually the data will be written. Now, one thing that is consistent, or one thing that a developer can guarantee, is that the data that is received by the application will go to this particular queue. So this is the first aspect, and the second aspect is that whatever data is present in this queue will be written to all of these storage devices. So this is the second aspect.
Now, there will be some kind of lag, or, as I would say, some kind of time, that it will take for the data to be stored in the queue and for the data from the queue to be synced up to all the devices. As a result, there will be some time required. So let’s assume that it takes 1 second to achieve this particular operation, okay? 1 second for the data from the user to be stored in the queue. And it will be stored on all storage devices from the queue. So it only takes one second. So, let’s say this user sent this particular data at the 10th millisecond, okay? And now there is a second user who is querying this particular data. So what generally happens is that as soon as the application sends this data to the queue, it will return a success message to the user saying that the data is saved. Now, internally, if you see it, it takes around 1 second to save the data. Now a second user queries the data. He queries the data at, say, 800 milliseconds for this data that is written. Now, as you can see, he is querying for less than 1 second over here, it takes 1 second, and he may not receive a response. He will not receive the result because the data will take another 200 milliseconds to sync over here. And this is what is called “eventual consistency.” Like the data, it will take some time to become consistent across all the storage devices. So coming back to the presentation over here, So, again, ideal consistency is like storing some kind of data. So this is an S-three endpoint. So, in May, there will be a large number of buckets that replicate the same data.
So, if you store this data in the S bucket, the S bucket will internally save the data to multiple buckets. So, in general, idle consistency is not possible, or it requires a lot of challenges if you have a more scalable system. So, ultimately, consistency states that there will be a time lag between the time data is committed and the time data is visible to all participants. So AWS eventually provides consistency for overrides and deletes in all the regions. Very important. For example, remember that when you do an overwrite and delete, there is eventual consistency. So, if we delete an object from the S3 bucket and the application immediately requests that object, there is a chance that the object can be retrieved. So again, if we send a delete operation to a particular object on S3, and let’s assume that the delete operation is running, and immediately there is some kind of user who is making a get request for the same object, the chances are that the data might be retrieved for the user. Because, as you can see in the fourth location, the data has not been deleted. So this is what is called the “eventual consistency,” and this is some kind of trade-off that all the distributed systems users have to provide. So this is the basic idea of what an eventual consistency model is, and I hope this has been informative for you.
57. Improving performance for EBS with RAID
Hey, everyone, and welcome back to the storage section of the Knowledge World Video Series. And in today’s lecture, we are going to speak about EBS and Raid. Now, Raid is quite an important topic for us to remember, and in exams, you might get a lot of questions related to the integration of EBS with Raid to solve a lot of use cases. So let’s go ahead and understand more about this specific topic. Now, in order to understand rate, let’s take a simple scenario where we have two hard disc drives, each with a speed of 100 MB per second. So you have the first hard disc drive, and you have the second hard disc drive. Each of these hard disc drives has a speed of 100 MB per second. Now, there is a new requirement where the new application needs a hard disc drive with a speed of 200 MB per second. And the question is, what to do? Because we already have these two hard disc drives, neither of them can handle 200 MB per second. However, the new application needs 200 MB per second. So now the question is, what to do? Can we somehow use both of these devices together? Or do we need to procure one more hard drive that supports 200 MB per second of bandwidth? So, the best case is to use the power of two hard disc drives.
We can now achieve the 200 MB per second scenario if we combine two hard drives. Now, since each of these hard drives supports 100 MB per second, if we combine them, they can reach up to 200 MB per second. Now, this combination is generally achieved through a lot of technologies. One of them is Raid. So Raid is essentially a redundant array of costly disks. And it is a data storage technology that basically combines multiple physical storage devices into a single logical device for purposes of redundancy, performance, or both. So basically, this sentence is quite important. Raid combines multiple storage devices into a single logical device. So, if we use raid, we can combine these two storage devices to form a single logical device that can have a throughput of 200 MB per second. Now, there are various kinds of rate configurations that are available. The most common ones are Raid Zero, Raid One, and Raid Five. Now, depending upon the configuration that you use, the way in which the data striping or data mirroring happens is very different. So let’s go ahead and understand more about Raid Zero and Raid One.
So, as far as exams are concerned, Raid Zero and Raid One are quite important. So, in Rate Zero, what happens is that data is tried across multiple disks. So what happens is that you have a one, then you have the next bit, which is a two, which is stored on disc two. Then there’s B1 on disc 1, B2 on disc 2, and so on. So anytime new data arrives, the first block of data will be stored on disc 1. The second block of data gets stored on disc two. So, in order for rate zero to be reached, we need a minimum of two disks. Now, since we are using two discs, we can expect excellent performance because the blocks are striped. So, if this disc supports 100 MB per second, disc two supports 100 MB per second. Thus, through rate, we can achieve the 200 MB per second performance because the data blocks are actually spread across multiple disks. Now, rate zero is very fast, but it comes with the drawback that if one of the discs fails, So let’s assume that if disc two fails, then all of the data gets lost. So it is recommended to avoid using Raid 0 for critical systems. So this is about Raid Zero. Now, talking about Raid 1, what happens in Raid 1 is that data is mirrored across multiple disks. So you’ve got a one; you’ve got a B one; you’ve got a B one.
So you have a copy of the data across multiple disks. So, even if, say, disc one fails, you still have the same data on disc two. So you can go through your production without being hampered by data loss. So, Raid 1 is great for excellent redundancy because the blocks are mirrored. However, Raid Zero is good for excellent performance because it can combine the performance of both disks. So, rate zero should be used when we need great I/O performance, where IO performance is more important than fault tolerance. However, rate one should be used when fault tolerance is more important than IO performance. Now, rate can be used along with EBS volume as well. And we know that the EBS volumes are internally replicated by Amazon behind the scenes. So, even if we use Rate Zero with EBS volumes, the amount of durability is quite good enough because Amazon basically replicates the data behind the scenes for us. So, last but not least, we need to understand some of the important pointers for exams. So, let’s assume we have 500 GIB of EBS volume with 4000 provisioned IOPs each. So we have 2500 GIB volumes, and each volume has 400 or 4000 I Ops. So when we use Rate Zero technology, we know Rate Zero.
So if you use Rate Zero technology across both of these eBay volumes, then we’ll get 1000 GIB in total with 8000 IOPS. As you can see, the performance more than doubled from rate 0. However, when we use Raid 1, we actually get only 4000 IOPs, because in Raid 1, the data is actually replicated to provide excellent redundancy. However, in trading Zero, you get the best of both worlds. So, if you require a large number of IOPs, trading zero is something you should consider. Now, before we conclude this lecture, let me actually show you a few things that are quite important for exams. So, when you go ahead and create a volume, let me select a provisioned IOPS. So let’s assume that I have a size of 10 GB. So, ten gigabytes of data storage. Now, I have a requirement where I need a good amount of IOPS. So let’s assume that I need good I.O. So ten gigabytes with provisioned IOPs can provide a maximum of 500 I Ops.So give me 500.
So, you see, 500 is allowed. But if I put 501, you see, I get an error message saying that the maximum ratio should be 50 raised to one. So 50 multiplied by ten is the maximum IOPS that I can get for this storage device. And the provisioned volume’s maximum IOP is 20,000. So this is the maximum number of IOPS that you can get. Now, let’s assume that you have an application that needs 40,000 IOPS. Now, how can you achieve that with AWS? Since AWS is not giving you IOPS higher than 20,000, the only thing you can do is create multiple volumes with 20,000 IOPS. So you can create two volumes, each with 20,000 IOPs in rate-zero configuration. So, in a rate-zero configuration, if there are two volumes with 20,000 IOPS each, you achieve the requirement of reaching the 40,000 IOPS requirement of your application. So, in your exams, you will get certain kinds of scenarios where they’ll ask you how. How can you achieve this good performance with EBS? So this is all there is to say about EBS and trade. I hope you understood the basics of how you can achieve higher performance with EBS when combined with the technology of.
58. Instance Store Volumes
Hey everyone and welcome back to the Knowledge Portal video series, and today we’ll be speaking about the Instance Store. So let’s begin. As a result, AWS instance store by definition provides a temporary block storage volume for EC2. Now, what do I mean by that? In a nutshell, an instance store is a temporary storage device where we can put our data for the time being; if you’re familiar with Linux, it’s similar to the Temp file system. So what happens if you store data in a temp file system after you stop and start an instance, or generally after you reboot the instance? The data within that file system is lost, and this analogy is very similar to instance storage. So what really happens in an instance store is that the storage device or the hard disc drive is directly part of the server that is hosting the virtual machine.
Now, in the previous lecture, we discussed the virtualization environment and also looked into the architecture of a cloud provider where you have servers. Over here, on top of servers, you have some kind of virtualization technology, be it either Zen or KVM or VMware VSphere or HyperV, etcetera, depending upon the provider that you are using. And on top of the virtualization hypervisor, you have the virtual machines. Now, let’s assume that each of these virtual machines belongs to an individual customer. So there are four virtual machines, and thus there are four customers. Now, in this scenario, we are using the storage device of the particular host server. Now, the problem with this type of architecture is: what happens if the host server fails? What happens if the storage device on this server fails? Now, in this scenario, if the storage device of the host server fails, then the storage of all four VMs will fail, and this is actually very dangerous.
What happens if there is some critical database that a customer is running and you cannot rely on this kind of scenario? This is not a very optimal scenario. So what happens in the idle or real-world use case is that the cloud service providers will never use this host storage. So storage is a network cluster, and this cluster is mounted into each individual VM. Now, we’ll be discussing this in the relevant section, but just to give you an overview, in the idle case, the storage device of this server is never used. Storage from some kind of technology, like network attached storage or a Nast, is mounted on this relevant server. Anyway, that is irrelevant for now. So, going with the second point, this storage is located under these files that are physically attached to the host computer. So I hope you got the meaning of the second point. And the third point is that the size of the instance store varies depending on the instance type. Now, you will understand this when we actually do the practice. So let me switch back to the AWS console, and let’s click on Launch an instance. So, if you go to the community AAMI and scroll down, you’ll see two root device types. One is EBS, and the other is Instant Store. So EBS is like permanent storage. Instance Store functions similarly to temporary storage. Now, each of them comes with an advantage. So let’s click on “Instance Store” for now. And for our use case, let me click on the first one, which is AMI 08368. So I’ll use this AMI, and I’ll click on Launch. Let me see. And if you notice, it does not allow you to select all the instance types. There are specific instance types that you can use.
And one thing to keep in mind over here is that M stands for small. One virtual CPU is now included with each small computer. 1.7 GB of RAM and 160 GB of instance storage. So if I just go a bit up, you see Instance Storage, 160 GB. If you go to “M one dot medium,” The instance store is 410 GB. Now, one of the advantages of this is that it’s completely free. So the storage part of Instance Store is completely free. You do not really have to pay for that. This is one of the reasons why instance stores are frequently preferred. So if you select a Medium instance, you actually have a 410 GB storage device that comes free with this particular instance. So in our case, I’ll use M for medium. Over here. I’m going to configure the instance details. I’ll use the default VPC for Tangme. I click on “Add Storage.” And if you see, there are no storage devices attached over here. So if you directly click on Review and Launch, for instance, Store will be directly attached. However, we’ll click on “Add New Volume.” And if you see where the instance store is attached, you cannot really modify any of these parameters. And this directory relates to the third point, which is that the size of the instance store varies depending on your instance type. We have already discussed that M.
Onedot Medium has 160 GB. Then, if you are going with another instance type, it has a different storage device attached to it. So you cannot really change the size of the instance store for a particular instance type. So I’ll click on “Review and Launch” and let this launch. I’ll name this instance “Store.” And now one thing to notice is that if you go down into the root device type, you see Instance Store over here. Now, in the previous launches of instances that we did, you see that the root device type is EBS. So EBS is like permanent storage, a proper hard disc drive. However, if the root device is InstantStorage, it means temporary storage now, till the time it gets launched. Let’s complete the second slide that we have, which will give us further understanding of an instance store. So the data in an instance store is lost in the following situation, and this is very important. We’ve already talked about instance storage, also known as temporary storage. So in what scenarios will the data be lost in that temporary storage? First, the underlying disc drive fails. Now, we have already discussed that the instance store uses the underlying storage. So if the underlying storage of the host server fails, then the instance store is lost. That is the first. The second is if the instance stops. If an instance stops, then the instance is terminated and the instance store is lost. The third is if the instance terminates. The next point to mention is that the cost of EC2 instances includes instance stores.
So they are quite cost-effective. So we saw that if we just launch one small instance, then we are getting 160 GB of free instance storage without paying anything. So let’s discuss these two points. After we complete the practical, it will become clear what an instance door is. So I’ll just copy this public IP. Let me do as I said; just turn it to see if I can connect. Okay, I am able to connect. Let me do AC to the user at the rate I need to specify the key. Okay. So I’m logged in. Now let me do the pseudo-sue hyphen. So we’ve arrived at the source. Now, one way to recognise an instance store is directly within the device itself. If you perform a DFN edge, you will see something similar to fmrol 10. So Fmryl is essentially temporary. So, if a device is mounted, for example, 147 CB of a hard disc drive mounted on SlashMediaFmarel 10, this indicates that the device, this specific device, is a temporary storage device. This is very important to understand. The second way to do it is to go to the console and see the root device type, which can be either EBS or instance store. Now, let’s do some use-case testing. So, if I go to root, I’ll create a folder and name it backup. And inside backup, I’ll create a text file, say kplabs TXT. And inside this text file, nano is not there. Let me just do an echo. This is an instance-based tour lecture, okay? I’ll save it in kplabs.txt. If I only do a cat So what we have essentially done is create a folder, and inside the folder we have created a text file called skplabs TXT. And inside the text file, we have written a simple one-line sentence. Now, it is very important to understand that the data within the instance will be lost if the instance stops. Now, this is very important. If the instance restarts, then the data will not be lost.
So let’s understand this. This is a very important point. Now, I’ll show you one interesting thing. So if you try to click on “instance state,” You see, you cannot really stop here. You can either reboot or terminate. So if you stop this particular EC2, it will automatically terminate. Now let me go ahead and reboot this particular instance. So I’ll click on “reboot.” So this is just to see if the things mentioned in the documentation as well as the things we are currently studying are true or not. Let’s wait. And this is one of the very interesting scenarios, I’ll tell you. Just a few months ago, we were actually planning to do maintenance on a server. It was a database server that needed some kind of maintenance. As a result, we had to halt the instance. So before stopping the instance, we made sure that we had everything ready. We made a checklist related to the patches and everything that was applied. And all that was needed was for us to shut down for a few minutes. And after some activity had been performed, we would restart the instance back. Now, at the last moment, just before we were planning to shut down, one of the system administrators just verified and checked that the root device type was Instant Store. And that was a very important observation because if you had shut down the instance, then it would have been a nightmare. So we’ll look into what I mean by this. So it’s very important that before you shut down the instances, you check to see if it is not an instance store. Now I’ll just try to connect back to the server again. Okay, now we are connected. So, if you just want to check if the data is still there, I go to slash backup, and you can see that Kplabs X is still present. Having Instant Storage will not affect the restart of EC. Now, for those who are very curious about what will happen if we shut down, AWS does not allow us to shut down directly from here. So for those crazy guys who want to experiment, we can directly shut down the server itself. So I’ll run the halt command over here, and let’s see what happens.
Okay, so I have run the halt command, which means shut down. Let me just refresh. Wait a minute, and the status has changed to initializing. It should change instantly, and now the instance is shutting down. Let’s see what really happens after it shuts down. Let’s wait. If it’s still shutting down, then I think patience is a virtue if you want to know a few things and if you see that it has been terminated. So as soon as we shut down the instance, which is backed by the instance store Idly, it gets terminated. and this is very important to understand. So make sure that whenever you stop an instance, the instance root device type is only EBS and not based on the instance store. So this is the basic information about the instance store. Again, if you want to do the practical yourself, you can go ahead and do that. However, these items may not be free tires, and you may have to pay for them if you want to do the practical related to this aspect. Anyways, this is the basic information about the instance store. I hope the points that we discussed have been understood by you. And lastly, if you are planning to use instancestore, make sure you take the backup of the device or the data that is stored in the storage device to a central location like S 3.
59. Implementing NAT Gateway design for Higher Performance
Hey everyone, and welcome back to the Knowledge Full Video series. So, today, we’ll talk more about the Nat gateway’s performance, which is another important topic for which you’ll see questions in your exams. Now, before we actually start, I hope you understand the difference between a Nat instance and a Nat gateway. So this is something that we have seen quite earlier in the associate-level course.
So when you talk about a Nat instance, it is basically an EC2 instance that we launch that has Natting-related functionality. And after we launch that instance from the AMI, we disable the source destination check. So that’s basically the Nat example. Then AWS came up with a NAT gateway, which was more of a managed service from their end. So let me just show you. So if you go to the VPC now, you have a NAD gateway wire. So if you click on Nat gateways, you can go ahead and create your own Nat gateway. One of the advantages of Nat Gateway is that it is fully managed and highly available. So we don’t have to be concerned about that simple instance failing and disrupting our entire internal traffic. So when it comes to the Nat Gateway, there are certain performance aspects that we need to remember as solution solutions architect. The first is the Nat gateway, which can handle bursts of up to 10 GB of bandwidth.
Remember the word “burst,” because it does not provide a steady 10 Gbps of bandwidth. This is the burst bandwidth that it can support, which is a maximum of 10 Gbps. Now the second important point to remember is that if all the instances within the private subnet need to have traffic less than that of 10 Gbps, that is quite good because it supports burst. If there are more than that, the network will become congested. So let’s assume you have 200 or even 300 EC. Two instances within your private subnet, and if all of them together during your peak hour of production usage go beyond ten Gbps, then your website or your application will become very slow because the network is a bottleneck, even though your EC instances are very, very fast. But if the network is a bottleneck, ultimately things will become much slower. So this is the reason why, as solution architects, it is expected of us to show how we can deal with this scenario.
So in order to have more bandwidth, the recommended design is to split the instance across multiple subnets and attach different NAD gateways to each of those subnets. So this is one of the recommended approaches to solving this kind of issue. So let’s look at how that would work. So this is a normal Nat gateway-based architecture where you have two private subnets, and each of those private subnets has a Nat gateway attached in the route table. Now, this NAT gateway supports bursts of up to 10 Gbps. So even if there are like 500 to 1000 instances across these subnets, all of those instances can have a maximum of 10 GB/s of shared network bandwidth. Now, since we have discussed that this specific network bandwidth can be a bottleneck, in order to improve design, what we can do is have an approach of multiple NAT gateways. So, during multiple NAD gateways, we create a single net gateway and connect it to a specific subnet. This is a private subnet one. We create one more NAT gateway and attach it to private subnet two.
Now, whatever instances we launch, we have to make sure that we basically split those instances across those private subnets. And now, in this design, we have a ten gigabit burst for the first NAT gateway. The second Nat gateway has a ten-gigabit burst. If you need more, create one more private subnet and attach one more Nat gateway to the third private subnet, and you can have up to 30 GB/s of burst across all the subnets. So this is one thing that we should be remembering. Now, how do you do that? So, let’s try it out. We’ll create a NAD gateway. And, as you can see, it asks for a subnet when we create the NADgateway. We now have three subnets within the KP LabsHyphen new VPC. So let’s do one thing. This NAT gateway will be connected to the subnet one. I’ll create a new elastic IP, and I’ll click on “create a NAN gateway.” This is the Nat Gateway, then. Now, let’s create one more NAT gateway. This time, I’ll connect it to Kplabstwo B, the second subnet. Create a new IP and attach that NAN gateway again and again to the third subnet of Kplabs Hyphen New. Create a new IP and attach it over here. Perfect.
So exactly what we are having over here is that now, for each of the subnets, let me just show you. So there are three subnets that are private in this specific VPC. So for each of the subnets, we are attaching a NAT gateway. Subnet two A will receive a burst of 10 GB, subnet two B will receive a burst of 10 GB, and subnet two C will receive a burst of 10 GB. So in total, there will be 30 GB/s of burst performance across all of these subnets. So this is something that we should be remembering. Now, in exams, you might get questions related to Natinstance and how we can make it more highly available. And again, the scenario becomes the same, where you should have multiple Nat instances. So these are EC’s two instances. The first EC2 instance will be in subnet one, which is availability zone 1. The second EC2 instance will be in availability zone 2. Now, if one EC2 instance goes down due to an availability zone failure, you always have the second EC2 instance in the second availability zone, which can be used to route the traffic across. So this is something we should keep in mind when it comes to Nat Gateway high availability and overall Nat Gateway performance.
60. Understanding Memcached Engine
Hey everyone, and welcome back to the Knowledge World video series. now continuing a journey with the caching subsystem. Today we’ll be talking about a very important technology in caching, which is called the memcache library.
Now, in the earlier lecture, we were looking at the basics of what caching is all about. Now, one important thing to remember over here is that caching is not just limited to the HTTP protocol; it is actually used in a lot of use cases, including databases. It also includes the operating system and hardware. So, when you buy a CPU, it usually comes with cache such as L1 cache, L2 cache, and L3 cache. So even in hardware, the importance of caching is understood. And this is the reason why we actually decided to talk about cache in much greater detail. So with this, let’s go ahead and talk about Memcache D. So Memcache D is a general-purpose distributed memory caching system. So, this is the word in question. Memory is very, very important to remember because the data and objects that memcache stores are stored in memory.
Now, typically for a caching system, there are two places in which it can store the data. The first is memory, and the second is a hard drive. The amount of retrieval speed that occurs now varies greatly depending on the underlying storage technology that you use. We’ll be looking into it on the next slide. So Memcache D is often used to speed up dynamic database-driven websites by caching data and objects in RAM. So in order to understand memcache, we have a nice little animation that is being developed. So let’s look into it. So on the left-hand side, you have your client, or you can consider it your application. And on the right side, you have a database containing a specific object or data. Now, whenever an application wants to retrieve this data, the application would have to send a specific SQL query. So, for example, choose Star from Star. So it has to send some kind of query to the database. So in the first step, the client or application will send a query to the database. The database will process that query, and it will fetch this data from the underlying storage. Now, there can be underlying storage like a hard disc drive or a solid state drive. It will fetch this data from the underlying storage, and it will give it back to the client.
So far, this appears to be a happy situation. Now what happens if, after five minutes, there is one more client who will send the same query to the database, and the database has to send the same operation and send the same object back to the other client? Now, when you talk about websites like LinkedIn, Twitter, and Facebook, there are certain articles or certain tweets that millions of users will read. And if millions of users read the same tweet, that basically means that the application will have to send the query to the database, retrieve the data, and send it back to the client a million times. So, there are two problems over here. One is that the database is quite slow. So if this object is stored in the underlying hardware, it will take some time to retrieve the object. And the second is that the amount of data in the database will increase tremendously. So, in order to speed things up, caching technology is being used. So, what happens is that there is middleware introduced, and once the application retrieves the data, it will store this data in this caching subsystem. Consider this to be memory cache T.
Now, next time when the application wants to retrieve the same data, instead of sending it to the app database, it will query the cache system and retrieve the data from the cache. Because this object is now stored in memory, retrieval time is extremely fast when compared to the database. So let’s compare how fast a hard diskdrive or a solid state drive is to a RAM disk. So, if you look into the sequential readings, you have one, one, two. Take a look at the figure. Right now, you have one, one, two, and you have 477. So there is a significant distinction between a hard disc drive and a solid-state drive. When it comes to RAM disks, the number is 5766. So you see, the amount of difference that you will find in the numbers between a hard disc drive, a solid-state drive, and a RAM disc is tremendous.
And it is for this reason that Memcache actually stores data or objects in memory to speed things up. And whenever it does, the retrieval time is extremely short. So you will find that your website is loading very, very fast when you use some kind of memory-based caching system. So, now that we understand the basics of Memcache, let’s look into how we can integrate it with our application. So, there are three ways to integrate Memcache with our application. So, what you need to do first is try and fetch the data from Memcache D. So, whenever an application needs to retrieve data, create a record that first attempts to retrieve the data from the Mem cache. If the data is not present or is not found, then fetch the data from the database through a query. If there is no data, the application will generate a query.
So you see, the query is already present, and this query will send it to the database. Once it retrieves the object from the database, it will store the data in the Memcache Database Caching subsystem. So next time when the application tries to receive this data, it will get it from the memcache instead of the database. So this is about the theoretical part. Let’s try one thing, and let’s try the practical aspect as well. So what I’ll do is get my Ubuntu operating system up and running. So let’s go ahead and install memcache. Perfect. So memcache is installed; I’ll quickly verify if it is started. You see, memcache D is not running, and I’ll start the memcache service. Perfect. So, if you run a status on memcacheD, you’ll see that it’s up and running. Now let’s do a quick PS aux on memcache, and you will find that memcache is running on port 11211. So this is the port where it is running. So I’ll do a quick telnet on 127001 on 11211.
So now I’m connected to the memcache service. So if I do a quick stats, it will basically show you the stats related to the memcached service. Now there are two important configurations that we have to look into. The first is the get underscore hits and misses. So what exactly are these? Let’s understand. So whenever the application successfully retrieves the data from the memcache, the get underscore hits counter gets updated. However, if the application tries to retrieve the data from memcache and memcache does not have that data stored, then the get underscore miss gets updated. So, in an ideal situation, get-under-score hits should be a higher number. So the higher the number you can say, the faster your website will be in a high-level overview. Anyway, this is about the beginnings. Now there are a few important things that I wanted to show you. So there is a simple document I have written, and I’ll be posting it in our forum. So you can use this as a reference to see how exactly it would really work.
So in the first practical, what we’ll do is we’ll set some data in our memcache on the server with the help of set. So I’ll say, “Set KP labs.” You have your flag, and you have your expiration timer. So this is the expiration timer, the amount of bytes in data, and the associated value, which is mem cache D. Perfect. So basically, what we are doing is storing a key-value store. So this is the key, and this is the value that is associated with it. So if I do a get on Kplabs, what you’ll find is that you’ll get a value, which is memcache D. Now that we have retrieved this value from the memcache, the get underscore hits should be updated. So, if I do a quick stats and go up, you can see that “get underscore hits” has been increased by one. So let’s run the same query again. So I get KP labs and run stats once more. The underscore hits have now been increased by one, bringing the total to two. So let’s see if we can get some underscore misses as well. So let’s do something that the cache does not have. So give me a Kplabs one.
Now, you see, I did not get any response. That means the memcache does not have any response associated with this value. So ideally, this is a miss. So, if I run statistics now, you’ll notice that the get underscore misses has been updated by one. So this is how it really works. Now, apart from storing the value and retrieving it, memcache D also has a lot of operations that have been supported, which include the increment and decrement operations. So typically, in an application where voting is required, the memcache supports the increment and decrement operations. So let’s look at how it goes. So I’ll do a set of votes, the flag, the expiration timer, and the bytes, and I’ll say ten sets of twenty-one to be perfect, and I’ll show the value as ten. So now this value is stored. So when I get votes, I’ll get a value of 10, which is associated. Now, if I want to increment it, I’ll use increment votes of five. And now it has multiplied our value of ten by five. So you can even use get words to verify. You find that the value is 15.
So you can increment or decrement the data that is associated with memory cache D. So there are a lot of other functions that are present in Memcached. So this is the high-level overview of the memcache D-based service. Now, let’s look into some of the important points that we must remember. First is the memory cache, which is a simple volatile cache server. Now, since the data is stored in memory, remember that whenever the server reads the file, the entire file will be lost. So that is very important. Typically, in a situation where memcache is used, certain applications are being used to dump data from memory to the hard disc drive. So, if the server restarts, the data can be read back into memory. So, a very important thing to remember is that memcache stores data in memory. If the server restarts, all your data is lost. Perfect. The second important point is that it enables us to store a simple key-value pair with a value of up to 1 MB. So we stored a value of “KP Labs.” So KP Labs was the key, the value was the memcache p, and the second time, the key was votes and the value was ten. So it is a simple key-value based storage. The third point, which you already discussed, is that it is an in-memory caching solution. As a result, if the server is restarted, all data from the memcache D is lost. The following critical point is that memcache D is multithreaded. So if you have multiple threads, which is what modern CPUs have, it can use those threads in parallel, and things will become much, much faster. And last, since Memcached is a distributed system, it is quite easy to scale it horizontally. So these are some of the important points that you must remember. Now, we have already discussed that there are various other technologies that are present. Memcache, D, and Redis are three of the most famous technologies that are used in memory-based caching solutions, which are typically integrated with databases. So this is it above our memcachi; in the upcoming lecture, we’ll be