Amazon AWS DevOps Engineer Professional – Monitoring and Logging (Domain 3) Part 3

August 28, 2023

7. CloudWatch Metrics – Overview

So now let’s go into Cloud Watch, and we’re going to do Cloud Watch indepth in this few lectures. So we’re going to look at Cloud Watch metrics first. So Cloud Watch metrics is on the left hand side here, will contain all the metrics from all the aws services that do track some metrics. Okay? So as you can see in the bottom, we have information around the namespaces from aws, for example, api, Gateway, elb, ecode Build, EC Two, ebs dynamodb and so on. So all these services have some ways of tracking their metrics into Hard Watch metrics. And so if we look at firehose, for example, because we just use this, we can look at the firehose latency, and here we go, we get a graph with the firehose latency, we can look at the firehose number of requests made and so on.

So we can get some information around all these different services. And one of those obviously is going to be EC Two. So for EC Two, we can get information by odo scaling Group, by ami ID, per instance metrics, aggregated per instance types, or across all the instances. So if we look at per instance metrics, we can look for example, at cpu usage, cp utilization for this one instance, and we can see how it’s changing over time. So that’s definitely one thing you can do. So you need to definitely know how each metric works for the most important services.

So for EC Two, we have cpu utilization, we have network in, network out, we have some information around the cpu credit usage and balance for T two micros, we get some information around the status checks and so on the networks in and so on. So this is very important for you to know. And as you can see here, we get some information around every minute. And so let’s go into the EC Two so we can learn about this. So we’re going to EC Two and we are going to the instances. And we’ll take one instance that’s active, for example, this one. If we go to monitoring here, we get an aggregation of all the monitoring. And so as it looks like in here, we get detailed monitoring.

So this is something that was enabled already by the ecs cluster. And so what is detailed monitoring? Well, if you click on this graph, you can see that we get a data point every single minute, okay? And so this is what detailed means. That means that the metrics are coming in every minute, but detailed monitoring is something you have to pay for. And let me close this.And so for another instance, for example, for our Jenkins server, we have Basic monitoring. And basic Monitoring means that we get metrics every five minutes. And so if we wanted to enable detailed monitoring, we click on Enable Detailed Monitoring, and it says, okay, now you are going to get your metrics at a 1 minute frequency but you are going to incur charges for this.

Do you want to have that? Yes. Enable. And here we go. It’s been enabled, and now metrics will be collected every minute. So you definitely have to know the difference between the basic metrics and the detailed metrics for EC Two. Now, also going into cloudwatch, so we do get some information around history. For example, I can get three days of history and look at the cpu usage across three days. I can look at one week. And it’s been consistently growing, apparently. And so you can ask me, well, how far back can you go? Right, you can go all the way to 15 months. So 15 months is the maximum you can get for the data retention in Cloud Watch.

Okay? And as you go far back in time, cloud Watch is not going to remember every single metric for 15 months, every single minute or five minutes. What you need to know is that Cloud Watch will start dropping some data over time. So if I scroll down into these metrics concepts, as you can see, it says that the data points of less than 60 seconds are available for 3 hours. Then the data points with 60 seconds are for 15 days, for five minutes are available for 63 days, and then finally, you’ll get one data point every hour for between 63 days and 15 months. So as you go far back in time, you will get a less accurate metric and not accurate metric, but it will have less granularity, as it used to have.

And the reason is because you want to save on cost. Okay? So hopefully this makes sense. And this is something you have to remember, because we’ll see, for example, that the elasticsearch could have full retention of your data over time, and you can control how long you want your metrics to be retained. And we’ve got granularity, okay? But with going into Cloud Watch metrics, you need to know that your metrics are only available for up to 15 months. But if you go far back in time, you will only get a data point per hour. For example, okay?

8. CloudWatch Metrics – Metrics to Know

So being a devops, you need to know a few metrics. Now you don’t need to remember every metrics for every service, but some of these metrics should be quite natural to you and understand why they’re being collected. So let’s start with EC Two. So I’m going to go to EC Two and if I go to my instances in here, so oops instances and I click on one of them, for example, this one, and we go to Monitoring. We need to understand that. 1st, first of all, we have cpu utilization and this can be really helpful, for example, for scaling an auto scaling group disk reads, Write and operations and bytes or operations are only available if you have an Instant Store. Okay? So if that was an Instant Store EC Two, then we would get this information.

But if you have an ebs volume, we only get the disk information directly from the ebs service. We also get some information on network in, network out. So this is really helpful as well. If you want to scale in based on your network coming in for your ASG the status check, which is how often the instance is failing some checks, and if there is some fail system, then that instance can be out of service and this could be bad. Okay? And finally, if you have a T two micro or any T two type of instance, then you get cpu credit usage and cpu credit balance going over time. So you can see if you’ve burst too much into the performance. So something that’s missing in here for sure is going to be Ram.

You don’t have any information about the Ram, you don’t have any information about the number of processes running on that instance. And this is something that we can bring in through the Cloud Watch Agent, for example, or using custom metrics. So we’ll see how to do this later on. But I want you to realize right now that Ram for example, is not available. Okay? So next let’s go into ebs, just because we’ve talked about disk io. So if you look in here, this has an ebs volume attached to it. So if I scroll down and go to my blog device in here, I can see this is an ebs volume. And so going into this volume, we get some monitoring as well.

And here we are going to get some information around the read bandwidth and the right bandwidth. So how much data is written and read from the volume. We get some information around the read throughput and the right throughputs the queue length, which is how many operations are being queued for writes on the ebs volume. And a large queue is bad time spent idle, so which is how long, how much percent of the time the disk was doing nothing. And a high number is good, obviously. And then the right size meanings are my I O small or big and so on. And then because this is agb Two type of volume, it can burst.

So what is my burst balance? So you need to remember those. These are some quite basic metrics as well. But one thing that’s obviously enough here is how much space is there left on my ebs volume? Because again, this is something that aws cannot get for you. So this would need to be a custom metric. So here we get a lot of information around how much rights and how big they are and so on. But we don’t have any information about how much space is being used on my volume. So good to remember. Let’s go. Next into ASG. So oto scaling groups. Oto scaling groups. I have one right now, but you can create your own if you wanted to get some monitoring available.

So you get some information around the minimum group size, the maximum group size, the desired capacity. So these are ASG configs. How many instances are in service as accounts. So right now I have one of them. How many instances are coming up, how many instances are standby? We’ll see what that means when we go into the ASG settings. terminating instances, how many ones are being terminated, and the normal of instances in total. If you wanted to have group metrics collection, that means collecting the cpu and so on. Network metrics for your auto scaling groups, you would need to enable it and you would get aggregate metrics, the same one you get for EC Two, but at the ASG level.

Okay, next, let’s talk about load balancers. So right now I have an application load balancer being created. And if I go to monitoring, I can see how long it took for the targets to respond, how many requests were done, and I get some information around the type of errors that happen. So how many five xxx, four Xx and so on. So all the Http errors, all the elb specific errors. So five xx and four Xx and so on. And also we get some information around target connection errors and the sum of rejected connections, how many negotiations were errors, and so on. How many active connections, I guess. So this is directly an elb metric in here. So this is quite nice.

We know how many connections are made directly to the elb new connections counts, how many bytes are processed, and then finally the consumed load balancer capacity units. And if you don’t know this, this is how aws will bill you for your load balancer. So the more load balancer you use, the more load capacity units you’re using. And this is how the pricing is determined for your load balancer. Okay? And then finally, I don’t have an rds available, but I did create one earlier on this course, so I can go into Cloud Watch and look at the automated dashboard for rds. This is actually quite a cool feature. You can click on view automatic dashboard, and this will create a dashboard for you in Cloud Watch.

Although there is nothing to show right now, so this is quite not handy, and let’s go back and actually see the metrics. But this would work if you wanted to use, for example, EC Two and click on automatic dashboard. Hopefully now this will show me, yes, a lot of different panels around the important metrics and so on. So this is quite cool, but let’s go back into Cloud Watch metrics just so we can look at rds metrics. So we’re going here rds, and we can look at across all databases. And so we get some information around cpu information, the database number of connections, the bin log, disk usage.

So how much disk usage is going on into our rds database, how much memory there is, the network throughputs the read I ups, so the latency iups and throughputs the swap usage, and the same for right I ups, right linc and right throughputs. And so, as we can see, we get a lot more information than for EC Two because rds is a managed service and aws knows how to, for example, evaluate how much there is space on your disk because it manages your rds database for you. Okay, so these are the five most important services you need to know the metrics of.

But I do encourage you to definitely go into other services, for example dynamodb or code build or beanstalk and so on, to look at the metrics available, just so you get a better idea of what Cloud Watch tracks. It’s not something that they will ask you. Please name all the metrics that a service has, but by knowing if a metric is in a service, you’ll be able to know whether or not you need to create a custom metric, or whether it’s here and whether you can use it for an ASG and so on. Okay, so I think I’ve showed you the most important ones, but again, as a devops, it’s always good to know a few of them. So please play around. Please click on the services and see what you have available to you. alright, that’s it. I will see you in the next lecture.

9. CloudWatch Metrics – Custom Metrics

So now let’s talk about custom metrics. And custom metrics are really helpful when you want to publish a metric that Cloudwash doesn’t have. For example Ram for your easy to instances or disk space available for your EBS volumes and so on. And so you have the option to choose two kind of metrics for custom metrics. Either it is a standard resolution and that means that you have a oneminute granularity for your custom metric or a high resolution and that means that the data can and be having a granularity of 1 second and metrics produced by the alias services are standard resolution by default. Okay, so let’s go ahead and try to publish a metric. And so for this I have a cli command we can just run and we’ll have a look at it in a second.

So this is AWS Cloud Watch Put Metric data and then you have to define the metric name. So funny Metric, the namespace for example custom, the value of that metric and the dimensions so which is attributes you want for example instance ID and instance type and so on. But you can have whatever you want in there up to ten dimensions, the profile I’m using for the cli call and the region I am in. So let’s go ahead and click Enter and the put metric data has succeeded. So now if I go back into Cloud Watch, look at my metrics and refresh this page, hopefully I will get a custom namespace appearing right now.

So let’s go back to all metrics and yes we have a custom namespace name custom and it has one metric for instance ID and instance type as dimensions and here is my Funny metric and I need to remember this, remove this other one. Here we go. So here is my funny metric. And right now we don’t see anything because it’s not appearing. But it should appear in a little bit of time. So I could go ahead and keep on, for example, publishing different values. So I can say now the value is 1000, so I’ll have to say 1000. And here we go. We just push another metric and so on. And what it will do is that the metric should appear in here in a little bit of time. So I’ll wait a little bit until this happens. And while this is happening, let’s talk about other things.

So you can use dimensions and you can have up to ten dimensions in one custom metric. So this is something we should have done. Then we can publish a single data points. For example we could say okay, I want to publish a data point for this timestamp value instead of not specifying a timestamp. And then you could also publish directly statistic sets so you’re able to say okay, I already know what the average is going to be over 1 minute. So what I’m going to do is just go ahead and publish the sum, the minimum, the maximum, the sample count and the average and so on. And so you don’t need to publish all the metrics into Cloud Watch. You can just publish statistics, texts overall.

So let’s go back to Cloud Watch and refresh. And here we go. We have one data point in here that just appeared. So this is our first published custom Cloud Watch metrics, and we’re good to go. So we could go ahead and definitely use the sdk or the Cli to publish custom metrics. But as we’ll see, for easy two instances, there is this called the cloudwatch logs Agent or the cloudwatch Agent overall, that is called the Unified Agent that will be able to push some metrics for us, and we’ll see how to use this in the future, hands on. Okay, but what you remember out of this lecture is that we were able to create custom metrics into custom namespaces and use the Cli to push those metrics. All right, well, that’s it. I will see you in the next lecture.

Uncategorized

Related posts:

Leave a Reply Cancel reply