Amazon AWS DevOps Engineer Professional – Monitoring and Logging (Domain 3) Part 9

August 31, 2023

23. CloudWatch Dashboards – Overview

So now let’s talk about Cloud Watch dashboards. And you have the possibility to create dashboards, and I’ll call this demo dashboard, and you do get billed for each dashboard that you create, but I think the first one is free. And so in this dashboard, you are able to add widgets. Would it be lines, stacked, area numbers, text or query results from log insights? And so, for example, for a line, we can configure it, and then you need to choose some kind of metric. Metric. So let’s go to EC Two. We’ll get it per instance metric, and we’ll get again, our cpu utilization. And here we go, we get our first widget and we could go ahead and resize it anytime we want, rename it and customize it. When we’re ready, we just save the dashboard. Okay, so this is a pretty easy dashboard, and you can save it, you can export it. It’s also multiregion.

So if I change regions and go to another one, then you will see that the graph remains here. So I will be able, for example, to add and I will show you this in a second, I’m still able to see the cpu utilization from Ireland, even though I’m in Ohio. So if I go, for example, for Northern Virginia, and in Northern Virginia, I will add a widget and it will be another line or just a number, which is how much I’ve paid so far. So total estimated charge in USD, and I’ll create this widget. So here I’m able to get this estimated charge widget, and to make it work, I’ll change the interval to be 12 hours. So I can see now I’ve paid $6 so far. And here is my cpu television graph. And as you can see, these come from different regions.So this come from Ireland and this comes from Northern Virginia.

So this is quite nice. You can have a multiregion cloudwatch dashboard. And something you need to know about Cloud Watch dashboards. If you go to Cloud Watch dashboards documentation, there is this worldwide named correlation. And so this is something you have to remember. This is just the way AWS describes this, and I think this is quite confusing, but cloudwatch dashboards do give you the ability to correlate data within cloud Watch. And by correlate, they means seeing it all together as one dashboard, and that’s about it. But this is something you have to remember, because I remember the exam asking me how can I correlate data, and well, Cloud Watch dashboards was the answer. So remember that word. And then finally, if you are looking about Cloud Watch dashboards that you can sample.

Remember, if you go to let’s go back to our region, which is Ireland, and then I will go into my metrics in here you are able to, for example, for EC Two to view an automatic dashboard which will show you a bunch of metrics already created for you. And this is what’s called an AWS service dashboard that we can use right away, which is quite nice and shows the capability of Cloudwash dashboards overall. And we can see how we can create graphs, how we can have a list of all the easy to instances and so on. Okay, so that’s it for this section. Very, very short. But remember, dashboards can be multiregion and they can be used to correlate data within AWS and in a graphical manner. All right, that’s it. I will see you in the next lecture.

24. X-Ray – Overview

So let’s learn now about aws. xray. xray is a service that collects data about requests that your application serves and provides tools to view, filter and gain insights into that data to identify issues and opportunities for optimization. So the idea is that you have a distributed application and it will do a lot of api calls or interact with a lot of other microservices resources, databases, web apis and so on. And you want to have a general view of how things went, how your performance was, if there was any error and where it came from and so on. Okay, so this is what xray does to you. So xray can be instrumented using an X ray daemon, which is something that would run on an EC two machine, for example, the sdk to send data to the extra demon or other sdk to directly send data into xray api.

What you get out of it is through the web browser and we’ll see this in a second, you’ll get a general idea of the service mesh between your application, how they relate to one another, the error rates and so on. Okay, so for this, let’s go into X ray and we’ll just launch the sample application. Because from the exam perspective, you just need to know at a high level what X ray is, is it and how it works. So we can do debugging tracing and service map. So let’s get started. We’ll launch a sample application and then we’ll click and launch it. This will launch a cloud formation templates and we’ll click on Next and for the subnet you enter any subnet within your account.

For the vpc, you enter any vpc you want and then you click on Next and you create the stack and you take the box because it has to create im roles for it to work. So now we just need to wait for the application to be started. So I’ll just pause the video. So the events are done now and our stack is being created. And so if I go to the outputs, we can see there is an elastic beanstalk environment url. This is where our application has been created and we get access to the aws X ray sample application. If you’re curious about the cloud formation, you can go into resources and look at all the different things that have been created as part of this stack. Okay? But for now, let’s just go in here and we have our sample X ray application.

So what I’m going to do is just generate some sign ups and for this you click on the Start button. If you wanted to generate some manual signups, you just click on this sign up today button. But for now I’ll just click on Start and this will automatically send requests every 6 seconds and generate some dummy sign ups into dynamodb table. So then once you’ve generated a few requests, let’s switch for ten requests. I’m going to go to the address X ray console and scroll down, click on Done, and then click on Service Map. And it’s starting to compute the service map from me. So what is the service map? Well, this is a way for aws to map what is happening within my infrastructure.

So this is just early stages. So you can see the client is me, and I’m talking to my EC Two instance, which is this application right here. And so, as I wait along, this service map should be refreshed. I’m refreshing the map on the top right hand side. And if my EC Two instance is talking to other services, then they start being mapped here as well. So, as you can see, this EC Two instance is also talking to this url, which is the metadata service from aws. It’s also talking to sns. So probably to publish some sns information, some notifications and send out maybe email verifications and so on. And here maybe we have a dynamodb table that the EC To instance is writing to whenever a sign up is created.

And so, as we can see in here, whenever there is a little bit of orange, that means it’s error. Whenever it’s green, that means okay. So for now, it seems that there is a little bit of error on my EC Two instance, but no errors at the dynamodb, sns or metadata service. But let me refresh the service map again a little bit. And, yes, everything looks good. So if we go in here and look at the error, I’m able to get some information about the traces. So I can click on error and click on view traces. And this will take me directly to the error itself. And so, as you can see, my web browser has tried to request the five icon ico. And this is a little thing here, but it does not exist. And therefore, it got a 404.

And so this is the error that was done. But so, as we can see from the service map, we’re able to really get a lot of information around how all the services relate to one another. So if we don’t take error trace, let’s take, for example, an okay trace and click on View Traces, we get all the information that we’re okay. And one of those, for example, is this post. And it was for a sign up. So it took 83 milliseconds to respond. And so if I click on it, I’m able to have a breakdown of everything that happened. So it went from the front end, from the EC Two instance onto dynamodb for 40 milliseconds. And there was a put item api call, and then into the sns topic, and that was another 40 millisecond.

If I update my dynamodb, I can look at all the stuff within this dynamodb api call. I can look at the resources that were created, the operation that was a put item, the annotations and metadata if there is any, and some exceptions if there is an error. And same for the sns topic. I’m able to see how long it took and what was the resource. So it was a published on this topic, arn. And again, no additions, metadata and exceptions. So this is really, really helpful when you have a service map and when you get all these things, because you get a really high level overview of what’s going on. And as we can see here, for example, there was an error in my dynamodb table, so I can click on it, filter for errors, click on view traces.

And this one did not succeed for some reason. So let’s click on this trace and learn why. So the front end talked to dynamodb and this dynamodb api call failed. So if we go to exceptions, it says the conditional request failed. And then here we have the entire stack trace. So we can exactly understand why this specific api call did fail. And this is intended, this failure. This is because it’s a sample application, and displaying failures is obviously very, very important. So that gives you a better idea of how xray works. It’s able to drill down and allowing you to drill down and filter for traces and look at all the traces that were done within your accounts, understand, the one that generates some errors so you can fix them.

And if there’s an error, understand, does it come from this service? Does it come from sns? Or does it come from a dynamodb table or my EC two instance? And then you can look at also the durations. Maybe this api call took it a little bit too long. So I can go in here, zoom and view the traces only for the response time that we’re over 100 milliseconds and understand, okay, in here, it looks like a lot of things were done and this is why it went over 100 milliseconds. It was talking to dynamodb, to the metadata service, to sns, and the metadata service again and so on. So really, really helpful to debug things. And so anytime in the exam, you see traces, debugging, distributed application and so on.

X ray is going to be a great candidate for it. It’s different from Clywash, as in it doesn’t publish any metrics, it doesn’t publish any logs. It’s really around distributed tracing, getting applications stacked traces, and really understanding how each stripe goes and flows through your entire service map. Okay, well, that’s it for this lecture. When you’re happy, you can go to cloud formation and you can completely delete the stack. So you go to delete, and this will remove all the resources that were created for you. Okay, well, that’s it. I will see you in the next lecture.

25. X-Ray – DevOps automation

So here is a blog from the aws DevOps blog that I really like which says using cloudwatch and SNS to notify when xray detects elevated levels of latency errors and faults in your application. And so as we can see xray is for distributed and here they propose a sample app overview and architecture to detect if there are some errors detected within xray. So how does that work? Well there is a cloud watch event in here that is is will be time based so every whatever five minutes that will trigger a lambda function and this lambda function will call the Get service graph api in x ray and the response will be returned.

So this is the Get service graph and so it will ask for this graph and see if within this graph there was any errors because x ray doesn’t publish this graph by default. Okay, you need to use an api to extract that graph from x ray and then if the lambda function sees something that looks wrong, for example throttle errors or 400 xx errors or faults like five xx, then it will trigger a clywatch event rule. So this is how they would program it and this event rule would maybe send something to an SNS topic to send an sms. So the event rule would publish to this topic and then the topic would send a message as an sms. And also the event rule is definitely able to do multiple things at once.

You can have multiple targets for each Cloud Watch event rule and so it could also trigger an alarm and put an alarm into the alarm state and maybe this alarm state if it happens too many times then it will send a message to another SNS topic which will maybe send an email. So this is not something we’ll implement, but think about all the automations and how the whole story comes together as a DevOps. Here we have cloudwatch events. We have arist Lambda doing api calls to an external service and using another cloudwatch event rule that will have multiple destinations being SNS and a cloudwatch alarm, which in turn triggers an sms events.

So all these things and SNS topic notification, sorry, so all these things come together and as a DevOps this is the kind of things that the exam will test you on, it will ask you for the best solution for the most effective, so that for the simplers, the one that scales the most and so on. And so you need to be able to visualize these kind of solution architectures and understand how all these things fit together to come up with your own and understand at the exam which one fits best for their requirements. Okay, so just a little parenthesis here but hopefully this kind of architecture makes sense to you at this stage of this course and hope you understand now what the exam will test you on. So I hope you like this lecture, and we’ll see you in the next lecture.

26. Amazon ES – ElasticSearch + Logstash + Kibana

So now let’s talk about Amazon Elasticsearch and Amazon elasticsearch at the exam may be called Amazon es, so be careful for it. If you see Amazon es, that means elasticsearch. So Amazon Elasticsearch is a managed version of Elasticsearch, and Elasticsearch is an open source project. It needs to be run on servers, so it’s not a serverless offering. There’s no such thing as a serverless elastic search. And the use cases is logan’s analytics, real time application monitoring, security analytics, full text search, click stream analytics and indexing. Now, going into the exam, we don’t need to know exactly much about Elasticsearch, but you need to know when it needs to be used.

And usually anytime there is a search functionality that needs to be implemented or a dashboarding functionality that needs to be custom for Elasticsearch, then Elasticsearch is a great candidate for it. So Elasticsearch on its own in aws is consisted of three products. It’s called the elk Stack and it stands for elasticsearch kibana and lost ash. So the three together is ekl. Okay, so elk then it’s pronouncing the right in the industry. So Elasticsearch itself provides search and indexing capability, and you must specify instance types such as multiaz and so on. For Cabana, it provides you a real time dashboard capability on top of the data that sits within Elasticsearch.

And it would be an alternative to cloudwatch dashboards because it gives you more advanced capabilities that you can customize. So cloudwatch dashboards I would say, are quite minimal and easy to use, but minimal and capability, cabana is a lot more powerful. And so if you manage to store your metrics within Elasticsearch, then that means that cabana will provide you a lot of different dashboarding capabilities. So another one would be log stash. So log stash would be a log injection mechanism. And this is okay, an alternative to cloud Watch logs. And so for this, instead of using the cloud watch agent to send logs to cloud Watch, we would use the lug stash agent to send logs to Elasticsearch, and then we would visualize these logs maybe using Cabana and so on.

And why would we use luxury instead of cloud watch logs? Well, maybe because we want to have more retention or more choice on granularity and so on. So that’s all you need to know at a high level for Elastic search. But I still want to give you a few patterns of how Elastic search will be used in the exam. So, first one is for Dynamodb. When we have a Dynamodb table, we see there’s apis to retrieve and put items that are quite easy to use, but we need to know the item in advance. Okay? If we want to search through an item in the dynamodb table, the only operation we can use is a scan. And a scan is really inefficient because we have to go through the entire dynamodb table.

So it is quite common to ask for a searching mechanism on top of Dynamodb. And so through the integration of Dynamodb stream, that will send data to lambda function. And that lambda function that we have to create obviously would send data to Amazon Elasticsearch. Then we could build an api on top of Elasticsearch to search for items, return, for example, the item ids, and then use these item ids to retrieve the data itself from the Dynamodb table. So this is quite a common pattern, and this is how we take data from Dynamodb and index it all the way through into Elasticsearch using Dynamodb streams and lambda functions. Okay, let’s talk about Cloud Watch logs.

Now, how do we get Cloudwatch logs into Elasticsearch? Well, the first way is to use what we’ve seen before, which is a subscription filter. So we have Cloud Watch logs and we create a subscription filter for aws lambda. So it’s a function managed by aws, and in real time, that function will send data to Amazon es. This is something we can do straight from the console, and we’ll do this in the hands on in the next lecture. But another way that is going to be near real time is going to be using, again, a subscription filter. But this time we are sending data into kinesis data firehose, and we program firehose so that it will send data near real time when the buffer is full, or we have enough time, that is elapsed, that sends the data directly into Elasticsearch.

So, as you can see from Cloud Watch loves, this is two ways to send data into Elasticsearch, and both use the subscription filter. But in the first case, the subscription filter receiving endpoints is lambda functions, and they will have a real time capability to insert data into es, but probably a bit more expensive. And the second one has kennedy’s data fire hose as the end point, and it’s going to be near real time, but probably a bit more cheap. So, again, look at the requirements, understand what they mean, but I think this is a great illustration of how you can use subscription filter with Cloud Watch logs paired up with Amazon Elasticsearch. So that’s it just for the architectural overview and the product overview. So let’s go into the next lecture, just to create the first flow on the top of the screen.

Uncategorized

Related posts:

Leave a Reply Cancel reply