1. Construct KQL statements for Microsoft Sentinel
Everyone and welcome back to my course, Security Operations Analyst SC 200. We are beginning a new section, and in this section we are going to focus on the Crystal Query Language queries for Microsoft Sentinel. Now, these are the lessons that we’ll have in this section. In the first one, we’ll learn how to construct KQL statements for Microsoft Sentinel, then how to analyse query results, how to build multitable queries in KQL, and how to work with string data using KQL queries. So let’s get into the first lesson and talk about what KQL is and how to construct KQL basic statements for Microsoft Sentinel. As a result, the Kousa query language was born.
KQL is a query language used to perform analysis on data to create analytics, workbooks, or perform hunting in Microsoft Sentinel. Now, understanding the basic KQL statement structure provides, let’s say, the foundation to build more complex statements. So let’s see what a KQL query is and how it’s structured. A KQL query is basically a read-only request to process data and return results. Now, the request is stated in plain text using a data flow model designed to make the syntax easy to read, write, and automate. The query uses Schema entities organised in hierarchies similar to SQL databases, tables, and columns. Now, the query consists of a sequence of query statements, and at least one of them is a tabular expression statement that provides or produces data arranged in a table-like mashing of columns and rows.
The query’s tabular expression statements produce the results of the query. The tabular expression The Tabular Expression Statement syntax has a tabular data flow from one tabular query operator to another, starting with the data source and then flowing through a set of data transformation operations bound together by the use of the pipeline delimiter, just like in PowerShell. For those of you who are unfamiliar with the pipeline delimiter, this is it. Now, just as an example, let’s take the following crystal query that has a single statement, which is a tabler expression statement. And this is the query. So, a security event with an event ID of 466 and a summary count of ten. These are all limited, as you can see, by pipelines, right? So let’s analyse and see what this query does. So the statement starts with “security event,” right? The Event ID column value then filters the data rows for that particular table and then summarises the result by creating a new column for the Count by account.
And next in the prepare phase, the results are limited to ten rows. So this is the table from which we get the security event data. This is where we actually filter the data by adding a condition. So basically, what this says is to look in the security event table where the event ID is four, six, or seven, and then summarise the results because we want to count by the column account, right? And we want to limit the results to ten results only. This is basically what this statement does. Now, this is very important. It is critical to understand how the results flow through the pipeline. Right, everything on the left side of the pipe is processed before being passed on to the right side of the pipe. Okay? So, with that said, let’s actually get into the Crystal query language demo environment, because we can use it to build our queries, construct our queries, and play with KQL because we have some sample data generated now to access the demo environment. And you’ll have the link available in the downloadable resources for this lesson. Just remember that you have to be logged into your subscription, your Azure subscription, be it a trial subscription or a normal subscription, right? So let me open up a new tab here, and this is the actual link to the demo environment. So it’s https AK ms Lademo.But don’t worry, you’ll have it available in your downloadable resources for this particular lesson. When you access this demo environment, the query window is divided into three primary sections. The first one is in the left area, and this is a reference list of all the tables in the environment. And here is the area, right? So, in essence, you have all the sections, as the schemalet would say, of the tables that are available in the environment.
As you can see, if we go to Security and Audit, we have these tables: protection status, security alert, table security, baseline, table, security detection, security event, and all of these tables are available for us. to query here in the Query Editor, right? Because this is the second-most important part, and this is the query editor here. You actually type your queries—or paste them—however it suits you. Then there’s the bottom area over here, right, which is the query results area. Here you are going to be presented with the query results based on the query that you type in the Query Editor. Now, before running a query, you can adjust the time range to scope the data from over here. So you can query the data for the last 24 hours, 48 hours, three days, or seven days, or you can set a custom time range by defining the start date and an end date for your query. Right? Now, to change the results columns displayed, you can select the columns box and choose the required columns. But again, we don’t have any results here, so I’ll show you as soon as we type our first query here. Now, let’s talk about the basic operators in KQL. The first one is the let statement, right? And let’s make some statements, and I’m going to type them here so you can see how we go. Solat Statement. Let statements basically bind names to expressions, and they are used for several use cases, which we’ll go through in a moment. For the rest of the scope where the Let statement appears, the name basically refers to its bound value. Lap statements.
They improve modularity and reuse since they allow you to break a potentially complex expression into multiple parts. Each part is bound to a name through a Lap statement, and together they compose the whole. The Lap statement also allows for the creation of user-defined functions and views. The views are just expressions whose results look like a new table. So let’s go through some examples here of how you would use the Lat statements. First and foremost, the lat statements enable the creation of variables that can be used in subsequent statements. Now, please allow me 1 second to copy and paste a query over here. Here we go. Here we go. So let me remove this space, and as you can see at the start here, in this example we typed in “let time offset equal seven days” and “let discard event equal 4688.” So basically, in the time offset and discard events, the variables that we’ve typed here and assigned these values using the Let statement are created and used as a part of the security event where clause, right? because this is actually where our query starts. So we’re telling you to look in the security event table where the time generated is time offset times two and time offset. And remember the time offset; we’ve declared it as a variable with a Let statement and given it a value of seven days. So basically, what this says is that the time generated is seven days times two ago. That means 14 days and seven days ago, right? And the other condition, the event ID, is not equal because this is the operator equals and you have an explanation point in front of it, indicating that it does not equal.
So the event ID does not equal “discarded event,” and the discarded event is, of course, 4688, right? So let’s run this query to see the results. And of course, this function basically takes the current date and time and subtracts the value provided in the parentheses, right? So that was 14 days ago. So it will subtract 14 days from today’s date and time. Now, here are the results. As you can see from the results, you can expand them to see every row in the results. And of course, this is what I was talking about, the columns. So you can change this to select and display whatever columns you want in the results. You can select them all, deselect them, or select only the ones that you are interested in, right? Okay. Now let’s go into our next use case for the late statement. and this is to declare dynamic tables or lists. So again, it lets us create dynamic tables or lists. And I am going to just copy and paste another query. Here we go. Allow me to remove this space once more. Now, don’t worry; you will have all of these questions available in the downloadable resources for this lesson, so you can play around with them.
So now let’s talk about suspicious accounts, right? So we name our variable “suspicious accounts,” or else our dynamic table “suspicious accounts,” and this equals to a more complex query. So we create a table where the account is Administrator and Anti Authority System, and then we query the SecurityEvents table where the account is in the Suspicious Actions table that we’ve created over here, right? And again, the Suspicious Action table contains two accounts: the Administrator account and the Anti-Authority System account. And then we query the security events to see if the account is in either of these two accounts, because we only have two in this table. But you can imagine we could have basically created a dynamic table with more and more accounts in our case, but you can use any type of data that’s available to you. So if I run this query, bear in mind that not all the queries here will return results because this is just a demo environment and there is sample data in the environment, but there isn’t sample data for all of the tables available here in the environment. Now this one returned results, and as you can see, we are getting results for our query over here. Now, let’s talk about yet another operator that’s highly used in KQL, and that is the search operator. And let me type it here so you have it available as well: the search operator. And the search operator provides a multitable, multicolumn search experience.
Although the statement is easy to use, it is inefficient compared with the where operator, right, when you specify a condition when you look into a specific table. Now, even with this, the use of search to find data when unsure which table or column to filter is very handy. Now, first of all, let’s put a very simple query here. So we’ll search, er, have it search for this specific string, er, across all of these tables that are available in our environment. So if I just click on this, it will probably take a little while because it is querying all of the tables in the environment, right, and then I will show you how to refine your search a little bit more where you can use specific. Here we go, the query has been exceeded because, once again, this is not how to use the search operator because you don’t want to search across all the tables in the environment because it will either take a long time or result in an error like this? So to further refine our search, we can do this. So search in Security Events, Security Alerts, and any table that starts with “aright” and search for the term “err.” And let me run this query again to see if we get results this time. If not, we could also limit the tables by removing the A and wildcard. By the way, “star” means a wild card in KQL.
And we would have searched for “err” only in the security event and security alert tables. Now let’s get on and talk about another operator that is probably the most used one in KQL statements. and that is the where operator. So the where operator—and let me just delete this—the wear operator So, basically, the operator filters a table to find the subset of rows that satisfy a predicate or a condition. So let’s see some examples. Let’s start with the first one, which is a security event. And then we use the where operator. The time generated is indicated in parenthesis by “go one day.” So what this basically says is, “Look in the security event and show me all the events that have been generated since yesterday.” Right? So let me run this table, this query, sorry, and we’ll see our results.
And then we’ll use another statement. Here’s another query that should make more sense and help you better understand the where operator. Right, so here are our results. I’ve now overpasted the next one. So for the next one, I’m not sure why it isn’t copying as it’s supposed to be.So let me just delete it and type it here. So, let’s see what this one does. This is how it appears in the security event table. And we want to get the events that have been generated since an hour ago. and the event ID is 464. This is the login event ID, right? So we want all the login events that happened in the last hour. This is what this query basically does. So we run this again, and I’m not sure why it’s returning an error. Here we go. If you get this occasionally, keep in mind that when you type your queries and leave spaces and stuff like that, make sure to select the entire query before hitting run. Otherwise, you might get an error like the one I just got a few moments ago. Right? Okay, so these are all the log-on events within the last hour.
And let me just paste yet another query and take a look at it. And here we go. So this one looks in the security event tables for all the events that were generated an hour ago, where the eventID is 4:6:4, a logon event. And the account type is “user.” Right? So if we run this query, we will get all the login events for all the user accounts within the last hour. Okay, we’ve finished with the wear operator. Let’s talk about the extend operator. All right, let me just delete this extend operator. With the extend operator, you can basically create calculated columns and append new columns to the result set. So let me just take an example and show you. I’m not sure if we will get any results for this query because I’m not sure we have any security alerts. As you can see, we are looking at the security alert table, right?
And I’m not sure if we have any alerts in the security alert table. Of course, no alert was discovered. But let’s take our previous query, for example, to show you a good way to understand the extend operator, right? Okay, and let’s see how this goes. So you can see that we have results in regards to the computer account. But if we just expand one of these results over here, we see that we have all of these statements over here. And basically, what we can do with the extend operator is actually drill into any of these rows or attributes, whatever you’d like to call them. And we can basically rename the column itself with the extended operator. So if I would have just added here “extend” and let’s say I wanted this subject username to be called in another way, let’s say “targeted account,” right? Account. So, extending targeted accounts equals subject username, right? Let me just copy this subject username, and we’ll put this in. Let’s say further that I only want to project the targeted account. So this is a project-targeted account, this one.Okay, and let me run the query again, and you’ll see that we’ll only get a column here with our targeted account operator, right? So basically, let me copy and paste yet another query.
This is a more complex example, but this is probably like a real-world example. So let me explain the query here. This is a Microsoft Sentinel rule, correct? So the essential concepts we will review here are the following: First, we define a time frame variable with the latoperator. Basically, it enables maintenance and, let’s say, reusability, right? Okay, now, I haven’t copied the entire query here; apologies. So you can see the length here. Okay? So first of all, we specify the time frame of day one (D). And then the second, the dynamic list creation, is stored in the domain list variable, right? So we’re interested in a dynamic list of domains, and we store this in a domainlist variable again using the lat operator, right? Then we use extend, and here we go. Just let me highlight this. So we use this extend operator to basically create two new columns called HTTP status code and domain, right? So basically, this is what the extend operator does. It creates new columns based on other columns and rows, right? And fourthly, the result sets will basically progress through the pipeline statements and will allow us to use the newly created column filters. This one has an HTTP status code as well as a domain. As you can see, we are actually querying again using the where operator. So we’re putting a condition in place over here.
We are looking for the HTTP status code to be 200 and the domain to contain a dot, right? And never mind about this; these are just some regex expressions, right? So this is a detection rule. So I hope you understand what the extend operator does. It basically creates new values, or new columns to be more specific, based on the results of other columns or other rows in other columns. Okay, I will run this. However, I highly doubt that we’ll get any results, of course, because we don’t have any CS log messages in our environment over here. Now let’s take a look at the order operator. And let me just delete all of these. And now we’re going to look at the order operator. So what it does is very simple: it sorts the rows of the input table by one or more columns, right? So the operator can utilise any column or multiple columns by using a comma separator. Each column can be sorted ascendingly or descendingly. Here we go. And this is a combined query where it uses the extend operator and the order operator.
So we’re looking at security alerts for events that occurred seven days ago until now. And we are creating a new column with the extend operator called severity order. And here we are seeing alert, severity: if it’s three, it’s high; if it’s two, it’s medium; if it’s one, it’s low; and if it’s information, if it’s zero, it’s informational. And then we want to order by this new column that we have created, right? Descending severity order and let’s see if we get any results. No, we don’t, but we’ll construct another query just to show you. So, let’s say we have a security event because we know we have them here, and we’ll see how much time is generated, right? In fact, I’m not going to put in a condition because we already set this to the last 24 hours. So I’m going to say that event ID equals 4624, right? And if we run this, we can see that we will indeed get results. And let’s just wait for the results to come in, and then I am going to show you how we can use the order operator here in this simple query. So let’s say we want to order by any of these columns, right? So let’s take a column, for example. Let’s order by IP port, right? So we are going to yet again put in another pipeline here and let’s say order by IP port, okay? And then we’ll get our results by IP port. I haven’t specified if we want to order ascending or descending. As a result, it will order ascending by default. Okay? So let’s wait for the results, and then we will talk about our last operator here in the lesson. So as you can see, we’ve got results. Let me just go to the IP port over here again.
Okay, so it’s in descending order. Okay, we’d specify ascending here if we wanted it, right? and we would get the results the other way around. Okay, now let’s talk about our last operator, which is the project operators. And I’m saying operators because there are more. It’s a simple project. You have them project away, which selects which column from the input to exclude from the output. The project is then yours to keep. It selects which columns from the input you want to keep in the output and projects a rename. It also projects a reorder. It sets the column order in the resulting output. Don’t worry; you’ll find a table with all of these descriptions in the downloadable resources for this lesson. But let’s see what the project operator actually does. The project operators basically control what columns to include, remove, or rename in the result set of a statement. So there are multiple types of project operators, which I’ve just described. And the first one is a simple project. Again, it selects the columns to include, rename, drop, or insert new computed columns in the results. So now let’s take this query as an example. security event, project, computer, and account. You can imagine what this does. Basically, it will take the security events from the last 24 hours because we have them set here, and it will project only the computer column and the account column.
So let’s run this and see the results. Now, the projectaway operator—basically, this one lets you select which columns from the input you want to exclude from the output, right? So if I just comment this out, let me just comment this out, and we will set the security events for the last 24 hours. Let’s wait for the results, which should come any day now, okay? So we have all these columns over here. So let’s say we want to exclude the time account and the account type from the output, right? So we would use them as the project operator. Okay, let me just do it like this. Project away, right? And we want these columns to be excluded, right? So we’ll leave the time generated, and we’ll exclude the account, the time generated, and the account type over here. Okay? And let’s run the query again. This time we will get exactly the same results, but we won’t have those columns in our results because we used the project way operator in our query. Here we go. So as you can see, it starts from the computer column, and all the other ones are the same way. You can use the project repository.
So you’d only specify the columns you want to keep in the output, the project rename, and the columns you want to rename, or the project rewarder, and the order in which you want these columns to appear. So, for example, you would say project. Let me just quickly project reorder, okay, and let’s say account channel. Today’s typing is off, so channel, okay? And level. Okay, let me put in a comma here as well, and let’s run the query again and wait for the results just to show you what this project rewarder does. And then, after this, we will conclude our lesson. Right? So again, we have account channel, account channel, and level, right? So Reward ordered the columns in the order that we’d specified in the query. Okay, this concludes our discussion for this particular lesson. I’ll see everyone in the next one, where we’ll talk about analysing the query results using KQL.
2. Analyze query results
Welcome back to my course. Microsoft Security Operations Analyst SC 200 Now, in this lesson, we will pick up where we left off in the previous lesson, and we are going to talk about using some operators and functions to analyse query results. So let’s get back into our demo environment over here and talk about our freedom. first operator here. And that is the summarise operator, right? So, for the summarise operator, let me just type it here. The summarise operator with its variations, let’s say, will create a new column with the calculated result for the specified fields. So if I take a statement, for example, let me just copy and paste it over here. Here we go. So if we take this first statement right, this will return one column.
That is a unique list of the activity column values from the security event table, right? So let me highlight this and run it, and as you can see, it returns a column with a unique list of values, right? Now if we take the second statement here, for example, this will return a count of security event rows where the event ID is 4688 and the count is grouped by process and computer because of the by clause, right? Because, as you can see, we have the process and computer count functions. So the result set will contain three columns: process, computer, and account in the last column, right? So let me highlight it and run this query as well. And as you can see, this is the process; this is the computer; and this is the count of the process and computer. So this process was observed on one computer 77 s process wNow let’s take the next function because we’ve already specified it, and this is the count function. So basically, an aggregate function column can be explicitly named by including a fieldname equals before the aggregate function.
The count function does exactly what it sounds like it does. Isn’t it true that the results are actually counted? So let me take this other query here as an example. And here we go. So this Krel query will basically return three columns, right? If we hadn’t specified this CNT equals, we wouldn’t have created a new column, and the default column would be count underscore. So, once again, this takes the security events generated an hour ago with the event ID 464 and summarises them by account type and computer name in a new column called CNT. Okay? So let’s run the query and see the results. Here we go. So this is the account type; this is the account itself, the computer. And 324 logon event IDs were discovered on this computer. Now let’s talk about yet another variation of the count function, and that is the decount function, right? So, what does the DeCount function do, and let me paste in an example query here, the DeCount function returns a count of unique values from the specified field, correct? So in our case here, it takes the security event and summarises and returns unique values of the IP addresses found in the security event. So let me just run this query again, and here we go.
We have found a number of 28 unique IP addresses within the entire security event table, and this is what DeCount actually does. Okay, now let’s go more into it. And when we’re talking about the summarise operator, which summarises our results, we can combine this with several built-in functions in KQL to further filter the output of our results. And the first functions that I want to talk about are argmax and arg. Let me just paste this one as well, so I won’t have to type it. If I can just copy the functions here, arg Max and arg Min, According to Arc Max and our arguments, these are to filter out the top and bottom rows. So let’s take the first one, the Arg Max function, and let me paste in an example query again. By the way, you will have all these queries available in the downloadable resources for this particular lesson. Now, let’s take a look at the following function: And first of all, let me just do this. So we need a computer name over here; solely limit the first ten results from the security event table so we can have a computer name. Now, I am going to paste the query again, but we will need to replace this computer name over here because I see that it exists in our demo environment. So, SQL 12, NA consolehotels.com, this will return the most recent row from the security event table over here for this specific computer, correct? And the star here, the wild card in the Argmax function, basically requests all columns from the row, right? So let me just highlight the query and run it. And of course, again, this is a demo environment, so we won’t have data for all of our queries.
So let me see if I can get some results by changing the computer name. No, but again, the important thing, as I mentioned, is to see what the query does, and I’ve just explained that to you. Now, in contrast, the Arg Min function, and let me paste a query again, basically takes the oldest security event for this particular computer, SQL 12, and will return it as a result in the result set. So let’s see if we have any data. Again, we don’t have data for this particular computer that we’ve talked about, but again, this is what it does. Let’s go back because, remember, we talked about this pipeline and this pipe character in the previous lesson? Let’s revisit it a little bit. So I am going to paste two queries here. Here we go. And let’s see, because this is very important to order the results, right? So the order in which the results pass through the pipe really matters. Let’s take the first query as an example. So in the first statement over here, let me run it as well, so we can get the results. Basically, here we will have the accounts for which the lost activity was a login, right? So it takes the security event, summarises it by ARQ max, by time generated, and by account where.
And then all the results flow through the pipe character, where they are filtered for an event ID. So it takes the last activities in the security event table and then filters them for logon events. Right? Now, let me emphasise the second statement: we basically have the same arguments, but we have rewarded them a little bit. So, because we take the security event table, this basically has the most recent login for accounts that have logged in from the security event table. Then we filter all the results by event ID to be a log-on event, and then we summarise by arc marks, by time generated, and by account. So this one basically takes the most recent logons, the most recent logons available in the security event table. Now, let’s look at the summarizeoperator’s more—let’s say—powerful capabilities for actually filtering our results. So this is the first one: makelist and make list. This function basically returns a dynamic JSON array of all the values of whatever expression you put in the group of the query, right? So if we take this query, for example, let me just paste it here. Here we go.
Okay, so this query will first filter the event ID from the security event table with the where operator here by logon events, right? And next, for each computer, the results will be a JSON array of accounts. But bear in mind that this JSON array will include duplicate accounts, right? Because the make list will make a list again of the specific IDs or filters that we’ve put in front of it, Where can I find the event ID for 64? So let me select this query and run it, and as you can see, it makes a JSON array of all the accounts that have logged on to this particular computer, right? Okay, now we have the next one, which is the make set function. And what this does is return a dynamic JSON array just like the makelist one, but this one contains distinct values that we specify in the expression. So, using the same KREL query, but this time instead of making a list, we specify making a set over here. Again, we will filter by event ID 464 of the log-on event. And next for each computer In the result, we will have a JSON array of unique accounts. So if I run this query, let me select the entire query. So if I run this again, you can see that only these unique accounts have logon events for this particular computer. Let us now discuss the final operator on our list, the render operator. Basically, we use this to generate a visualisation of our query results. Now, the supported visualisations for the render operator are an area chart, a bar chart, a column chart, a pie chart, a scattered chart, and a time chart.
So basically, these are all charts that we need to create using the render operator in our queries. So let me take this example: a security event that we are summarising and counting by account, and we want to render the results as a bar chart rather than a list like we have here. So I’m going to run this query, and here we go. You have a bar chart of all the accounts in the security event table. Now, we can also use the summarise operator to create time series. And let me just paste in this query over here. And from the beginning, I want to talk about the bin function because we find it here in our query.
So the bin function rounds values down to an integral multiple of the given bin size. As a result, it is frequently used in conjunction with summarised by. So, if you have a scattered set of values, using the bin function, the values are grouped into a smaller set of specific values, combining the generated time series and, for example, piping it to a render time chart will provide a visualisation of a time series event. So what this does is look at the security event; it summarises all the events, and we are counting by the time generated for the last day, right? So it binds the time generated for the last day. So it won’t take all of the time-generated values from the security event table. And then we are rendering it as a time chart so we can have a visualization. So let me just run this And here we go. This is our time chart visualization. Now, with that being said, guys, this concludes our lesson. I am going to see everyone at the next one, where we’ll discuss building multitable statements in KQL. Until then, of course, I hope this has been informative for you, and I thank you.
3. Build Multi-Table queries in KQL
Hello everyone, and welcome back to my course, Microsoft Security Operations Analyst SC 200. Now in this lesson we are going to discuss building multitable queries using KQL. First of all, often a result set from a KQL statement needs to be combined or joined with another result set. Well, you can basically use the union operator to combine two result sets together, or you can use the join operator to combine two rows based on a key value. And that is what we’ll talk about in this lesson.
So let’s get into our demo environment over here and talk about the first one. That is the union representative. And let me just type it here so you can see it. Okay. So the union operator, which takes two or more tables and returns the rows of all of them, In this case, in the case of the union operator, understanding how results are passed and influenced by the pipe character is critical. So I will give here three queries as an example, and we’ll discuss each of them and what they return. Okay, so the first query here is “security event,” “Union,” “security alert.” Now this query will return all rows of the security event and all rows of the security alert tables. So let’s run it and see what we get.
It will probably take a while because there are lots of records and rows in both tables for the security event and the security alert. And here we go. As you can see, there are 30,000 or more records in each row. All the rows in the security event table and all the rows in the security alert table Now let’s see what the second query does. So the second query will basically return one row and one column, which is the count of all rows of security events and all rows of the security alert tables, right? Because we are using a security event, we are making a union with the security alert, and then we are summing up the count. Basically, we are counting. So let me run this query as well. And here we go. This is the total count of rows from the security event and the security alert table. 400 and 447 thousand 935. Okay. And now let’s take our third query over here and let me expand and select it. Now, this third query will basically return all rows of the security event table and one row of the security alert table. Now the row for the security alert will have the count of the security alert rows, as you can see. So we’re using security events to effectively unionize. We are doing a union on the security alert and summarising by count, right? And then we will project the count column. So let me run this, and of course we will get it; let me see. Okay, here we go. There is one single row in the security alert table.
Okay, now one last thing I want to mention is that the union operator also supports wildcards to union multiple tables. If I take one query as an example, So let me just delete these ones. So what does this do? Basically, this Krel query will create accounts for the rows in all the tables that have names starting with security, right? And it will summarise the results by the type of row. So let me run this one, and you will see that we have 59 security baseline rows in the security baseline table. We have 11,594 rows in the security baseline table, one row in the security alert table, one row in the security detection table, and 448,094 rows in the security event table. Now let’s take a look at the next operator that we mentioned at the beginning of the lesson, and that is the join operator. Well, the Join operator merges two rows of two tables to form a new table by matching the specified columns’ values from each table. Now the syntax of the join operator looks something like this: So we specify the left table, and then we specify the join operator. We can then specify the joint parameters, right? and you will see this in the next query.
And then we specify the right table. And lastly, we specify the attributes we would like to join on those values that we would like to join on the two tables, right? And for this, let’s take this query example and let me paste it here. Here we go. So what this query does is, as you can see in the first table specified, the left table is the security event, right? Then, basically, the table after the join keyword is the right table, and this is the joint keyword here. And, once again, the security event is the correct table. So what we’re basically doing in this query is using the security event table, looking for logonevents, and summarising by creating a new column called logon count that equals account by event ID, right? the 464, the logon event, and the account, right? And then we are projecting two columns—the logon count and the account. Then we’ll join this, and we’ll go over the inner join here. What kind of join, what flavour of join? We’ll talk about that in a moment. We are attending the security event with the event ID 4634; this is the event logo, and we are summarising with the count logo. We are creating a new column called “Logofcount,” which equals the count of the event ID, which is the logo of the event and the account. And we are projecting the logo onto the account in this case.
Then all of these are added to the account attribute. So as you can imagine, if we run this query, we will get the number of log-on events by account and the number of log-off events by account from within the security event table. So let me just run this and take a look at the results here. Here we go. So this account has 7,470 log-on events and two log-off events. This one has 162 logon events and 8331 logoff events. And this is how you use “join.” But let us return to the joining flavors. And for this, I will get back to our slides for a moment here. And basically, when you join two tables together, you use the join flavours to determine the joining behavior. Now, it is essential to understand the impact of the records on the left and right sides based on the join flavor. Now, this graphic here shows which records will be kept if there is or isn’t a matching record in the other data set. For example, the inner join right will only show records from the left side if there is a matching record on the right side. The right side will also require a left-side record for the inner joint over here. And let me just quickly bring up my pencil. And as you can see, this is the inner-join flavor. And again, it will only show records from the left if there is a match over here. Right? Okay, now let me bring up the next slide and the last one from our lesson.
And here I have put together an ateable based on the joint flavor. And what output records will it return? So I will just go through the first one. As a result, the join type will be left anti or left antisemite. These are anti- and anti-Semitic on the left. These are the joint flavors. Okay. And this basically returns all records from the left side that do not match any of the records in the table on the right side. So pause the video, go through the table, and see all the joint flavours and their output records. And this concludes the discussion for this particular lesson. I will see everyone in the next one, where we’ll talk about a more, let’s say, advanced topic, and that is working with string data in KQL statements or KQL queries. Until then, I hope this has been informative for you, and I thank you for viewing.
4. Work with string data using KQL
And welcome back to my course, Microsoft Security Operations Analyst SC 200. Now in this last lesson from this particular section, we are going to talk about working with string data using KQL queries. Often, fields in a table store structured and unstructured string data. So you can write KQL queries to extract and manipulate data stored in these types of fields. A typical scenario here would be a key-value pair stored in a field, and you would need to query the specific value of a key, right? So let’s dive into it and talk about extracting data from unstructured string fields. Security log data is often contained in unstructured string fields and requires parsing to extract the data. Now there are multiple ways of pulling information from the string fields in KQL, but the two primary operators used are extract and parse. So let’s talk about the first one here.
And let me just bring up my pencil. The extract operator, right, now gets a match for a regular expression from a text string. So you have the option to convert the extracted substring to the indicated type and take this query into consideration. Basically, we can see that the extracted arguments are the following: So first of all, we specify Raja’s regular expression, right? And this is the one over here. And then we first need to basically specify a capture group, right? And the capture group is comprised of the following: So you have a positive integral constant indicating the extraction group, correct? And this is our constant. Then we have the zero that stands for the entire match, the one that’s the value matched by the first set of parentheses over here, the first set of parentheses in the regular expression, and 24 more subsequent parentheses, right? Now you have a text to look for. Here we go. And you have a type of “literal” at the end. And this is the one, and this is an optional type of literal, like, for example, a type of long. And if this is provided, the extracted substring will be converted to the type that you provide in the type literal. Now again, if the regex finds a match, then it will basically return a substring matched against the indicated capture group, optionally converted to the type literal that we’ve specified.
Now, if there is no match or type conversion sale, the return will be null. Okay? Now let’s talk about the other one, and that is the parse operator over here. The parser basically evaluates a string expression and parses its value into one or more calculated columns. The computed columns will have nulls for unsuccessfully parsed strings. Now the syntax would be this: So you specify the input table, then you specify the parse operator, right? Then you specify what kind of parse you want. And this can be simple, which is the default. So a string constant is a regular string value, and the match is strict; all the string delimiters should appear in the parsed string, and all the extended columns must match the required types. Then there’s the regex type, where the string constant can be a regular expression and the match is strict once more. So all string delimiters that can be a regex in this mode should appear in the parsed string, and all extended columns must match the required types.
Then there’s the flags kind, which is used in a regex model like greedy multiline mode to match your line or be case-insensitive in the re-two flags. Now don’t worry, you have documentation about all ofthese and what this does with examples that youcan play around in the demo environment and thedownloadable resources for this particular lesson. And lastly, you have the relaxed kind for the parts, and basically, the string constant here is a regular string value, and the match is relaxed. So all string delimiters should appear in the parsed string, but extended columns may partially match the required types. The extended columns that didn’t match the required types will also get a value of null. Then you specify the expression, and here you specify basically an expression that evaluates the string, the column name that you want to parse, and this is, of course, the name of the column to assign a value to extracted from the string expression, and the column type. And basically, this is the scalar version value that indicates the type to use to convert the value to the default. Here is the string type: This is basically about the parse operation. Now let’s talk about extracting data from structured string data because we also have this option because string fields may also contain structured data like JSON format or key-value pairs.
And of course, KQL provides easy access to these values for further analysis. First of all, we have the dynamic fields right in a table, and within the log analytics table, there are field types defined as dynamic. And dynamic fields basically contain a key-value pair, such as this dynamic field, for example, from the Asia Activity table. What structure does this dynamic field called properties D, which you can see here, have? So it’s in JSON format, and you have event category, auto scale, event name, operation, ID, and event properties. Here we go: these are all the, let’s say, key-value pairs from within this property’s D field. Now to access the strings within a dynamic field like this, you use the dot notation. So the property D field from the Azure Active Directory table is of type “dynamic.” And in this example over here, here is how you can access this particular value from the event category, right? So you specify the field, the dynamic field, using dot notation, and then you specify the key value pair that you want to target, in our case, the auto scale value. Okay, now we also have the JSON format, and let me change the slide here we also have the JSON format.So KQL also provides functions to manipulate JSON stored in string fields.
Many logs submit data in JSON format, which basically requires you to know how to transform data into queryable fields, right? So the following list over here is a list of functions that can manipulate JSON data and is basically a list of functions that are related to the JSON format. So for example, parsing JSON or dynamic to dynamic interprets a string as a JSON value and returns the value as dynamic. So you can use either of these to refer to a field. Here we go, like with the dot notation that we’ve talked about or with this type of notation. So please take a moment and go through this table. And let’s take these examples over here. And first of all, I will explain the first one. So we take the security alert table, we take the extended properties field, and we convert it to dynamic, right? So we end up with a dynamic field, and then with the field converted to dynamic, we can use the extended properties action taken to target this value from within the extended property values, let’s say a string of values, right? So this is what this basically does.
Don’t worry; you’ll have documentation for everything, including step-by-step instructions, examples, and everything else. Because I’m going to be honest with you here in this section, we have just barely scratched the surface in regards to KQL because it is a very powerful query language, and there are lots and lots of things to learn if you want to learn a devastated KQL. And there are dedicated courses just for that. But this is what you need to know for the scope of this course and for the scope of the SC 200 certification. So that being said, this concludes the discussion for this particular lesson and the section as well. You will have a lab available at the end of the section to create queries using Microsoft Sentinel. KQL, right? I strongly recommend that you go and complete the lab. You will also have the review questions available to test your knowledge about the topics that we’ve discussed throughout this section. And I will see everyone in the next section, where we’ll talk about configuring the Microsoft Sentinel environment. So until then, I hope this has been informative for you, which I thank you for.