Amazon AWS DevOps Engineer Professional – Monitoring and Logging (Domain 3) Part 4

August 28, 2023

10. CloudWatch Metrics – Exports

Okay, so what if we wanted to export these metrics out of Cloud Watch? This is not something there is a native feature for, and this is something the exam could ask you in the DevOps saying, okay, we would like to export all the Cloud Watch metrics somewhere, for example, into S three or into elasticsearch. How would we do this? Well, if we look at a metric, for example, let’s look at EC two, we’ll look at by per instance metric and we’ll look at the cpu you utilization of this instance. So let me click on it. Here we go. And let me go to my graph metrics, making sure I have the right one. Here we go. Okay. And I’m going to choose a period of 3 hours.

Okay, so let’s say we want to export this entire graph to S three, and how would you do this? Right, so there is an api call, and it’s called Cloud Watch get metric Statistics, which allows you to specify a namespace, a metric name. For example, cpu utilization. Some dimensions that you want to have. For example, the instance ID and the value of the instance. ID the statistics you want to retrieve, for example, the maximum. And that corresponds to the statistics in here. So we can choose maximum excellence and then the start time.

So when do we want to export from and the end time, what do we want to export to? So the last time, you need to make sure that this is not too big, otherwise you’ll get an error that you want too many data points, the period, the profile you want to use for the cli call and the region. So let’s go ahead and test this one. And by running this command in here, as you can see, we get adjacent back that represents a lot of timestamps and a lot of value. So we see that at this timestamp the maximum was this value and the unit was percent. And so what we could do is type this output into S three or elasticsearch, for example.

But obviously we can automate this. For example, if you’re a real DevOps, we would do something like this. We would go to Cloud Watch events, okay, in here, and we would create, for example, a schedule. So I’ll go to rules and I’ll create a rule and the schedule will be, for example, every 1 hour I will invoke a lambda function. And what we could be doing is going into lambda and you would be creating a lambda function. For example, call it Lambda export Cloud Watch two, S three, and this would be a lambda function we would have to code. I’m not going to code it, but I’ll just give you an example and we’ll create a role.

This is excellent. Create this function. So now we have this function called lambda export Cloud Watch to S three that will be doing the same api call called Getmetrics statistic and writing this into, for example, s three, the output into s three or sending it to elasticsearch. And so in here, I would go ahead and choose my lambda function. So maybe I can refresh it. It’s not here. So let’s refresh this page back into creating a schedule. Every 1 hour, the target is going to be lambda function, the lambda export to Cloud Watch us three, and configure details and say regular exports from Cloud Watch metrics to s three. And here we go.

We have a rule now, and this rule will be invoked every hour. And so the lambda function I have created will be invoked. And obviously we need to cut it, but that would be a way of writing exporting Cloud Watch metrics into an extra bucket, for example. And just something I want to show you, because this api call can come up at the exam. And overall, as a DevOps, you should always think about, how do I automate these tasks? And the best way for this for now, is to use Cloud Watch events, chain up with the lambda functions. And that lambda functions will do an api call into Cloud Watch metrics to retrieve that metric, and then do another api call into s three to send the data into s three. So that’s it for this little automation lecture. I hope you liked it, and I will see you in the next lecture.

11. CloudWatch Alarms – Overview

The next thing we want to look at is cloud watch alarms. Cloudwatch alarms are extremely important because they allow you to automate your infrastructure as a DevOps based on your behavior of your services or application and on the metrics that they publish to AWS. So the idea here is that you can have metrics per service. For example, for EC Two, dynamodb and all these services right here. And you could also have alarms for any kind of metrics you want. For example, in here I already have five alarms. Three of them are in the okay state. One of it is in sufficient data states. That means that it does not have any data to figure out its real state.

And finally, in the alarm, that means that the alarm is being breached and therefore something may happen. Okay, so this is quite important because here we are able to create our own alarm and we should select the metric. Now, any metric coming from cloudwatch is valid to have a cloud watch alarm, even custom metrics. So if I wanted to choose my custom metric that I choose from before, I could definitely use that one. Or I could select, for example, one coming from the application elb and it’d say, okay, per Alb, per alb metrics. And we’ll say, okay, consume the lcu and create an alarm based on that.

And for example, if the lcu is too high, then do something. So let’s use a metric that has some kind of graph. So we’d look at something good. So let’s go to EC Two and we’ll look at it by per instance metric, and we’ll look again at the cpu utilization from this instance. Okay, we’ll select this metric. And so now we have a graph. And so we need to select only one metric. As you can see here, you cannot combine multiple metrics into one alarm. This is not something that cloud was alarms can do. And this could be a trick question at the exam. Okay, so we have this AWS EC. Two namespace. The metric name is cpu Two Addition.

The instance ID is one dimension of that metric and we’re good. The instance name is right here. And we could look at a statistic, could it be the average or for example, the maximum and get a different graph and so on. So we’ll keep it as maximum and the period. So the period could be 1 minute if it’s a detailed metric, or five minutes if it’s a basic metric for EC Two and so on. So that looks good. Then we need to tell, okay, we have this metric, but what will trigger the alarm on this metric? So, if you scroll down for the threshold type, we could use static and use a static value as a threshold or analytic detection, which is kind of new, which is talking about a band as a threshold, meaning okay, if you’re outside the band, then you’re in alarm state.

But if you’re within the band, you’re good. Or we can say greater than the band or lower than the band. So it could be quite interesting as a different type of conditions. But for now we’ll keep it simple and use static. So we’ll say, okay, we want this cpu utilization to be greater than for example, 5. 5%, okay? And it must be a number. So here it drew a red line and to show us where is our threshold. And we can see that over this red line, this would be the points that would trigger our alarm. And under this point would be the points where the alarm is in OK states. So this is good and you could provide additional configuration.

So we could say how many data points of evaluation should be breaching for the alarm to go in the alarm states. And here one out of one is fine. And then we could say if we’re missing data, how do we treat this? We can treat this as breaching, maintain the alarm state or breaching threshold. So we have different ways of customizing this alarm. We’ll keep everything as default and click on Next. Okay, the alarm actions is going to be super important. So here we’ve defined how the alarm is going to be triggered and what it’s going to look for. But now what happens if it’s in alarm states? So if it’s in alarm states, we could for example, select an sns topic and send a notification to our sns topic.

So we could go ahead and create a topic for this, call it Default Cloud Watch Alarms topic. And as soon as the alarm will be in alarm states, then that topic will receive a notification. You could also add an email if you wanted to have that. So we’ll just create this and we’ll have Stefan@example. com and create this topic and we have created a topic. Now we could add this notification. So here we have one notification for in alarm, but we could define another notification for, again, in Alarm or in the OK state or in the Insufficient Data states. And again you would select a sns topic.

So as you can see here, the only target for our cloudwatch alarm is going to be an sns topic. The other thing we can do instead of sending a notification is to have an auto scaling action. So here we are able to add an auto scaling action and saying, okay, when this alarm is breached, you should autoscale EC Two or ecs and we need to specify a service or an auto scaling group and then the action to do so, this would be a way to configure auto scaling. We’ll look at autoscaling in the last section of this course, so for now we won’t touch it. And finally, super important, an easy to action.

What is an easy to action? Well, it would say, for example, if the alarm is in alarm states, okay, then you should stop this instance, or terminate this instance, or reboot this instance, or recover this instance. So these EC two actions are only available when you select a metric on an easy two instance, obviously. But it would be quite nice to be able to automate, for example, that if the cpu is at 99% for over an hour, maybe the instance is in a bad state and therefore you want to reboot it or terminate it if you have an ASG behind that instance, who knows, right? But it is quite interesting to know that we can do different easy two actions.

So here I will remove it, but some good capabilities to see as well. So I’ll click on Next and create. And then I’ll call it. cpu utilization over 5%. We couldn’t enter a description, but I’m just going to skip it. And here we can review everything and when we’re happy we just create the alarm. And so we’ve successfully created our alarm and now our alarm is working. So we click on it and we can look at the graph and we can look at the details of our alarm and the actions that it will do in case something goes off. So there will be a notification if it goes in the alarm states okay, we can also look at the history of that alarm and see see what happened over time.

So what I’m going to do is just wait a little bit of time and actually get some data out of it. So let me wait for a few seconds. And now we can see that this alarm is in the OK state. So it’s saying okay, because our cpu is too low and if we scroll down we can look at the history and said okay, the alarm was updated from insufficient data to okay. So one thing I want to realize is that if we go to Cloud Watch events, we will not be able to create a rule for this alarm. So this is something you should know, and a bit surprising, but this is something that we should know.

If we look at Cloud Watch as the source of an event, cloud Watch alarms is not one of them. So we have Cloudwatch events and Cloud Watch logs, but Cloudwatch alarm is not a specific event source and therefore the only actions you can do out of your Cloud Watch alarm, again, you need to remember them. So I’ll go and click on next. The only actions you can do is send a notification to an sns topic and from that topic maybe you can forward it to an sqs queue or maybe send it to a lambda function that will send it anywhere you want, or you will send an email and so on.

And then you have Auto Scaling Actions to perform odo scaling actions on your odescreen group and your ecs services and EC Two action again to stop reboot risk you or terminate your EC. Two instances. Okay, so these are the only three things that you can do out of a cloudwatch alarm. And there’s no way around it by using cloudwatch events. I’m showing you something you can’t do, but it’s good to know what you can’t do in the cloud as well. Right, so that’s it for this lecture. I hope you liked it, and I will see you in the next lecture.

12. CloudWatch Alarms – Billing Alarms

So finally, there is a very specific type of alarms that you can set, and these are billing alarms, okay? So you will not see those unless you switched regions and go to Us East Northern Virginia. And so by switching to Us East one, I will show you, there’s a new metric that will appear and we can create different alarms as well. So if I go to here, there’s billing alarms, and this is something new that was only created in Northern Virginia, Virginia, so we could create an alarm based on the billing we have. So, for example, I say, okay, let’s look at this metric. This is how much I’ve spent over time on this aws account. So I’m at $6.

And so I can say, okay, I’m looking for a static threshold, and I say more than $10. And next, and I will send me myself a notification and I will create a new topic for this. Excellent. And I’ll again enter my email, okay, and then create topic. And here we go with an sns topic. Click on next and then say test billing alarm. And click on next. And here we go. So this alarm will trigger if I spend more than $10 in a six hour period. Okay, so this is excellent. Sorry, more than eight. I put eight here, more than $8 overall in my month excellence. I’ll create this alarm, and sorry for having confused you. And here we go. I have created my first billing alarm.

The reason why we can create a billing alarm in this region only is that if you go to metrics, you will actually find that there is a billing metrics in here. So we could click on View automatic dashboard, and that will show us all the metrics that we have available.So we have estimated total charges, and let’s select something like three days so we can see a graph. So three days, these are my total charges. And also we can see by service. So for api, Gateway, for Cloud Watch, for dynamodb, for ecs, I mean, I’m not going to read all the services to you, but you get the idea, right? So, for example, say I wanted to create an alarm directly on Kms. So I would go into my alarms and I would create an alarm.

I would select a metric, choose billing and by service, and I would choose, for example, my Kms service, which is somewhere around here. Here we go, Kms. And I’ll select this metric and say, okay, I don’t want to spend more than $4 a month on Kms. If so, just send me an email and I’ll look at it. Then I’ll send it to an existing topic. Here we go. And click on Next. And I’ll say kms Billing alarm. And so this alarm is only specific to Kms. So the reason I’m showing you is that these metrics are only available in that specific region because this is where all the aggregated billing data is for aws.

Okay? And finally, if we go to metrics, we can also look at usage, and usage is going to give us some information around, put metric data, costs, get metric that I call counts, and so on. So these will be all the metrics of usage within Cloud Watch itself for the apis. Okay, so good to know all the billing metrics are here. And you can create an alarm at the global level, so for all your costs or at the service level as well, using the metric corresponding to the service you want to track. So that’s it for this lecture. I hope you liked it, and I will see you in the next lecture.

Uncategorized

Related posts:

Leave a Reply Cancel reply