1. Analyzing Your SIEM (Introduction)
Seam dashboards. Now, in the previous sections, I showed you a couple of different theme solutions and the related dashboards. In this section, we’re going to dive a little deeper into seams and learn how we can analyse and query logs and the seam data within them. In this particular lesson, we’re going to start with another look at those k at those scenes.
Now, cybersecurity analysts are usually going to work in a suit or a CSET, and they’re going to perform a lot of different functions. These functions include things like performing triage on alerts, being able to escalate true positives for immediate response, and dismissing false positives. They’re also going to review security data sources and check that log collection and information feeds are functioning as they’re supposed to to make sure all that data is getting into the theme. Additionally, they’re going to review cyber threat intelligence. This allows them to identify the priorities or potential consequences of various events occurring across their network and company. Another thing they’re going to work on is performing vulnerability scanning and vulnerability management. This way, they can understand their vulnerability posture within the organisation and what threats outside the organisation may try to attack them. Also, they’re going to identify opportunities for threat hunting as they’re going through their data.
They’re going to be able to start seeing different opportunities for threat hunting based on different cyberthreat intelligence and overall alerts and incident statuses that they’re going to be identifying as they start seeing what alerts are being triggered. They can also see what patterns there are, and based on that, they can find opportunities for going after those patterns by using threat hunting. Now, one of the big things you have to remember with security incidents is that they are identified and interpreted differently based on the overall threat level. Every organisation is different, and the overall threat level is different based on your organization. For example, if you work in the Department of Defense and you work for the US Military, you might have top-secret information that is in a much larger area and has a bigger threat base going towards it than, say, my small company. Now, alternatively, my small company may have some other threats to which we’re vulnerable that they’re not vulnerable to.
And so depending on these types of things, your overall threat level is going to change, and that is going to have to be identified and interpreted based on the events you’re seeing and the threats to your organization. Now, another great example of this is the OpenSSLvulnerability that happened a couple of years ago. This was known as “heart bleeding.” Now, this was a zero-day vulnerability that affected everybody’s websites if they used OpenSSL to do their encryption. Now, if you’re an e-commerce site, you’re going to have a really high threat level with that because that means your e-commerce is vulnerable. If you were just running the site for your personal amusement. Well, you might not have been as worried about it because you weren’t transmitting information back and forth to the server over an SSL connection. And so, based on whether or not you are vulnerable to this, whether or not you had a business case that relied on SSL, all of these things are going to go into your analysis and help determine how you’re going to interpret that security incident and what you should do about it. Now, the reason we’re talking about all of this is because when you build out your dashboard, you’re going to build it to present information. When we talk about a “seam dashboard,” this is a console that presents selected information in an easily digestible format, such as using visualization.
Now, when we talk about “visualization,” this is where you can take a widget and it can show you different records or metrics in a visual format, such as a graph or a table. For example, on your dashboard, you can have all sorts of different kinds of visualizations. Here’s an example from an elastic stack dashboard. Now here, you can see it running on top of the security onion. And we have all sorts of different data that is relevant to this particular use case. Now, inside, we can see on the bottom right that we have a pie chart. Inside that pie chart, we can see the relative balance of the different classifications without seeing the overall level. So, for example, I see there’s a lot of red there. In fact, it’s over 75% red. And as I look up at the legend, I see that there are two classifications. And so that tells me that this is a two-severity level, and that’s where we’re seeing most of our activity next. We also have things like line graphs. Line graphs will show the level over a specific time period. So you can see the counts, the number of log counts over time as we went through the day, at the top. Now, pie charts and line graphs aren’t the only ways that we can display information.
For instance, if I go over to my Splunk dashboard, I can see many different formats. Here, for example, I have a bar graph, and this is going to compare the levels between different classifications. You can see here the critical, high, medium, and low levels of urgency and the number of counts going across. This shows me that my medium urgency is the most common that I’m using. In addition to that, I might have gauges. Now, these gauges can be done in lots of different ways. I’ve seen a lot of them that looked like speedometers, or in the case of Splunk, they like to use these trending graphs with the up and down arrows and a big number there. These gauges show you the level that has defined limits associated with it. So in this case, we have different things like access, endpoint notables, network notables, identity notables, audit notables, threat notables, and EUBAS notables. And all of these have a number associated with them and a trend over time that can be very quickly looked at and observed. The final thing you’ll see on a lot of dashboards are tables. And the tables are going to give you a lot more information. A lot of times, these will be the top or bottom events.
In this case, you can see these are the top ten events that are being displayed across our entire Splunk system, which is acting as our dashboard. Now, all of these dashboards have something in common. They display metrics. And it’s important for you to select the right metrics for your dashboard. This is critical because if you select the wrong information, you’re not really presenting what you want, and it’s not going to be useful to your analyst, your manager, or whoever is looking at this dashboard. Now, speaking of that, one of the big things we have to think about is metrics. And metrics are also known as key performance indicators, or KPIs. Now, this is a quantifiable measure that’s used to evaluate the success of an organization, an employee, or other element in meeting objectives for performance. Now, this is a generic business definition, but when we start looking at it in terms of our systems, we have KPIs too. For instance, we have key performance indicators for the processor utilisation on your server. We have it based on the disc space that’s being used.
We may have it depending on the bandwidth that’s being used. There are lots of different measures and metrics that we can create for all these different numbers. Now, we’re not going to go into a tonne of detail in terms of metrics here because, as analysts, it’s not our job to define those metrics. Instead, that’s the cybersecurity engineers’ and the cybersecurity architect’s job to figure out what we should be measuring. That is something that is done at the management level, the executive level, and at the engineering and architecture levels. Our job as an analyst is to be able to understand these numbers and use them in the real world. Now, if you’re interested in going beyond the CSP exam and learning more about measures and metrics, I have a course dedicated just to them, which is for my service management students because it plays such an important role in their world. But for the rest of this lesson, we are going to talk about measures and metrics, and it is important for you to understand the basics of them. Now, when we talk about measures and metrics, what kind of things should we be measuring?
Well, we might want to measure the number of vulnerabilities. By measuring the number of vulnerabilities, we might know what type of service was affected and when these things were discovered and remediated over time. We also might want to be able to capture the number of failed logins. Because if we have failed log-on attempts or unauthorised access attempts, that could be an indicator of someone trying to break into our network by doing password guessing, brute force attacks, or something like that. We also might want to capture the number of vulnerable systems. When you do your vulnerability assessments across your network and you scan a host and you figure out that it is missing a bunch of critical patches, what do you do with that information? Well, if you know the number of vulnerable systems, we can remediate those and figure out which systems are in noncompliance and which ones are out of compliance. Then we might want to also capture the number of security incidents. How many incidents do we have?
How many were reported in a given period of time? Maybe the last week, the last month, or the last year? Are we on an upward or a downward trend? Are things getting better or are they getting worse? If you don’t capture these numbers, you’ll never be able to know. Then we’ll also want to think about the average response time when we identify a security incident. How long does it take for us to fix that problem? How long does it take for us to reimage that machine? How long will it take us to restore that employee to working order and a known good state? These are all things we want to capture. We also might want to capture the average time to resolve a ticket. Now, this might sound more like a service management issue, but it does affect the cybersecurity world too. Because if we have a helpdesk ticket that’s not being resolved, a lot of times our users are going to find a workaround to get their job done. For instance, in one organisation I was in, we had a closed network and an open network. The open network connected to the internet; the closed network did not. And if we wanted to get information from the open network to the closed network, we had to put in a ticket with the help desk.
Then a technician would take that information, burn it to a CD, move it over to the closed network, and then allow us to use it. Now, if they weren’t going to do that for us and it took them three weeks to do that and I have a presentation to make in 3 hours, that can become a problem. And so your average time to resolve tickets does affect your security. So keep that in mind. Another metric we might look at is the number of outstanding issues. How many things are sitting in the queue that haven’t been done yet? Again, people will find a way to get the work done. So if you have an access request or a port open request or something like that and it sits in the queue for six months or twelve months, people will find a way to get their job done, and that may break your security. Another thing we might look at is the number of employees who are trained. For instance, does your organisation do annual security training? Do they learn about threats? Do they learn about vulnerabilities? Do they learn about how to have long, strong passwords? Do they learn about two-factor authentication? Do they learn about phishing attacks and what to look for? All of these things could be part of your training, and if you know the number of employees who are trained, you know you have a better security posture.
So tracking that might be useful as well. And finally, we might also want to capture the percentage of testing completed. When you build a new application, does it get put online immediately or does it go through testing first? Well, it should go through testing first, and if it does, you should be tracking that as it goes through. This way, you know how many applications you have, how many have been tested, and how many have been released. All of these are just some metrics that you can think about, and there can be a lot more. This is not an exhaustive list; this is just something to get you started and thinking about it. Now, one of the important things is that when you’re configuring your dashboard, you need to display the needed information based on that user’s role. Do I care about the number of employees trained, for example, if I’m an analyst? Probably not. But if I’m the manager who is in charge of training the entire staff to minimise our vulnerabilities, I probably do care. And so the great thing is, with a lot of these dashboards, they use widgets to pull that information in from the scene. Now, by doing that, you can create different dashboards based on the employee who’s looking at them.
In my company, we have several different dashboards, and based on your position in the company, you have access to some, all, or none of those, depending on what you need. For instance, I own the company, and I have access to every single dashboard. That doesn’t mean I look at everyone on a daily basis, though. For instance, I don’t look at the dashboard for our ISSM system, which is where we get all of our support tickets from our students, and see what the resolution times are. I look at that once a week to make sure my customer service manager is doing her job, but I don’t look at it every single day because I have someone who does that. That’s her role. Now, on the other hand, I do look at our security dashboard every day with my tech team, because I’m a technical guy and I care about that information, and so does my tech team, and I actually lead my tech team. So in my role as the tech team lead, I’m going to be looking at that information. So you want to make sure you’re building your dashboard based on the user’s role and only bringing them the information they need to see. Otherwise, it’s useless information, and it’s just distracting to the user.
2. SIEM Dashboards (OBJ 3.1)
Scripting tools. In this lesson, we are going to talk about some of the basic scripting tools that you should know as a cybersecurity analyst. Now, when you issue commands individually, this can be useful, especially when you’re doing one-time analysis. But by using scripting, this will allow you to set up recurring searches to be repeated easily, and you can even automate them. So every day at midnight, this particular thing will happen. All of that can be done with scripting. Now, when I talk about a script, what is that? Well, a script is really just a list of commands that are executed by a certain programmer or scripting engine, and there are lots of different ones out there. For instance, you might write a script in Bash or PowerShell, Python, Ruby, or AWK.
We’re going to talk about a lot of these in this lesson. Now, before we dive too deep into scripting, I want to bring up a quick exam tip for you. An in-depth ability to conduct scripting is not needed for the Cyst exam, but it is really, really useful in the real world. Now, if you do decide to go into the Pen Test Plus curriculum, that exam does expect you to read and understand four languages: bash, PowerShell, Python, and Ruby. Again, you don’t have to be an expert, but you do need to be able to read a script and understand what it does. And you may be asked in a performance-based question to put a script together based on a basic concept or idea using different blocks that you’d move around the screen. But for the cyst exam, you don’t have to do that. And so we are going to focus very briefly on things like Bash and PowerShell, just so you get an idea of what these scripts look like. But we are not going to teach you how to write them or how to become an expert in scripting.
When you get to Pentest Plus, we will go through all four of those languages to teach you how to write some basic scripts in those languages. If you want to learn that, you can either enrol in the Pen Test Plus course or study on your own. But again, for the cyst exam, you will not be asked to create a script. You may get a multiple-choice question that shows you a script and asks you to read it and understand what it’s doing. That’s fair game. But again, if you can read English, you can pretty much read the scripts the way they’re going to give them to you on the exam. Now, the first language we want to talk about is Bash. Bash is a scripting language and command shell for Unix-like systems. So that’s going to be the default shell on Linux and macOS X, and you can set it up as your shell on a Unix system too. Now, when we’re dealing with Bash, it supports lots of different elements, such as variables, loops, conditional statements, functions, and a lot more. When you create a script inside of Bash, you always start out with what we call the “shellbang,” which is that hashtag and the exclamation.
Then get rid of Bash. This says that this is a script that was written in Bath. That’s all I’m telling the computer. Then we’re going to give it some information about what we want to do. In this case, I have a three-line script. To begin, echo NetworkManager entries. All of this implies that Echo is printing this to the screen. So, if I ran this within the Unix command shell, you’d see pulled Network Manager entries. When you press enter, it will perform the function “grep network manager VAR logsyslog cut d quotes, quote f one five.” And then we’re going to have it do whatever commands we want it to do. So really, the benefit of having an echo statement is to tell your user who just ran the script what’s going on, saying, “Hey, I’m working, I’m getting the entries for you, I’m doing something.”
Then it’s going to start doing those entries, and in the background, it’s going to do this grep. It says “grep Network Manager VAR log syslog pipe.” That will be piped into the cut command-quote space, quote f one five. So far, you should know what this says because we just covered this in the last lesson. Then we have this greater-than sign, and the greater-than sign says to take the output of cut and, instead of passing it to another tool, put it into a log file. In this case, we’re going to log it into this text file called “netman log TXT.” So we’ve grabbed something, formatted it using cut, and now we’re going to save it as a text file. Then we’re going to put a message on the screen for our user. The Echo Network Manager Log File has been created. That’s it. And then it goes back to the command prompt. All of this amounts to three simple things. Print something to the screen, do a search, and save it to a log file. Then print it to the screen, saying, “I created the file, and now you can go get it.” That’s all this bash does.
But it saves me a lot of time because I can just run this file every single time I want. Or you can set it up as a scheduled task. So maybe it does this once a day, and then I’d have all those different files logging up over time. Next, let’s talk about PowerShell. Now, PowerShell is a scripting language and command shell for Windows systems. You’re not going to find it on Linux, Unix, or Mac. Now, PowerShell supports elements such as variables, loops, conditional statements, functions, and command lists that use a verb, noun, or both. So just like Bash, all that stuff is the same. The only difference here is that we’re now going to be able to use commandlets as well. Now, what is a command line? Well, I’m going to show you that right now as we look at a basic PowerShell script. Here’s one. Write host. That is called verb-noun syntax. Right? “Write host” is just like “echo.” It instructs you to print this to the screen.
So write to the host, retrieving login failures, then getEventLog, which says, “I want to get information from this thing, the event log.” I want to get the five newest ones. I want to get it from the security log, and I want to get anything that has the instance ID of 4625, and that’s the one that I’m going to select. So essentially, this is a search command right now. Now, what is a search command doing for us? Well, it’s saying, “I want to go into the event log and check the five newest entries that meet these conditions.” They’re inside the security log, and they have the instance of 4625. Now, 4625 is a log-on failure code. So who are the last five people who tried to log on and failed? That’s all I’m asking here. The next line of the TimeWrittencomma message is then piped out as a C-failed TXT file. So I’d like to write the message from the logs that I just got to this output file. And then the last thing I’m doing is writing to the screen again. Right? Host logfail TXT has been created. So essentially, we’re doing the exact same thing we were doing back in Bash. Except in Bash, we were looking for network manager entries, and in this case, we’re looking for failed login entries. The next thing we need to talk about is Windows Management, Instrumentation, Command, or WMIC. This is a programme that’s used to review log files on a remote Windows machine.
So if I’m sitting on my machine as an administrator, I can actually remotely go on to your machine and check your logs if you’re part of my domain. Now, this looks something like this WMIC. And then, in this case, I’m looking for a NT event. n’t event, given a certain input, will return log entries that match your parameters. So what am I looking for? I’m looking at Ntech, where the condition logfile equals “security” and the event type equals “five.” Then I want to get the source name, the time generated, and the message. So essentially, this is doing the same thing I was doing before. I’m selecting all the security event log entries whose events are type five, in this case, an audit failure, meaning they couldn’t log in. Then I’m going to output the source, the time the event was generated, and a brief message about that event. This is really useful if you’re trying to find events based on specific details. And this is more of a one-time thing, but you can use this language inside of PowerShell as well and call the WMIC if you need to. Now, at the beginning of the lesson, I mentioned Python and Ruby.
Now, Python and Ruby are interpreted high-level general-purpose programming languages, and they are heavily used by cybersecurity analysts and penetration testers because they’re easy to write things in and they can do a great job of going through and searching files, dissecting things, and finding what you need. Now, again, going into Ruby or Python is way beyond the scope of the Cyst Plus exam. You should just know that if you see Ruby or Python, they are interpreted high-level programming languages, and it could be something you use to search for things. Now, in addition to that, they can do all sorts of other things because they are full programming languages. But for the Cyst exam, just know these are high-level languages and they are scripting languages, meaning they are interpreted line by line as you go through the scripts and they are not compiled. And since they’re not compiled, you’re not going to have a binary file associated with them, just a text file. So you can actually read all the source code very easily if you find one on a machine. The last thing I want to talk about is AWK. Now, AWK is a scripting engine that’s geared toward modifying and extracting data from files or data streams inside Unix and Unix-like systems.
So on Linux and macOS, AWK is pretty easy to use. And to run it, you’re going to do this from the bash shell. So you’re going to go and just type in AWK. And then, for instance, in this case, I’m looking for something. So I’m looking for anything that has the word “manager.” What’s inside the slashes there? And I’m going to print them to the screen, and I’m going to search the file “employee.txt.” So as I go through the employee TXT file, anytime I find a column that has the word “manager” in it, then I would print that to the screen. So I might go through all of my user accounts, and anybody in the manager group gets displayed. That’s the idea here with AWK. Now, obviously, there’s a lot more to AWK than what I’m showing you here, but again, for the Cyst exam, you really just need to know that AWK is a scripting engine and you can modify and extract data using it.
3. Analysis and Detection (OBJ 3.1)
Analysis and detection. So now that we’ve covered the overview and looked at the dashboard, which brings up those metrics, how do we do our analysis and our detection? When we talk about our scene, there is lots of different data that’s being put into the scene, and the scene has to apply different rules to all those different inputs and outputs to hatch and create alerts for us. Those alerts are going to allow the analyst to start investigating it. Well, an analyst needs to go through all of these different alerts and dismiss the false positives while responding to the true positives. Essentially, we’re going to have a lot of things that are positive, and that’s what we’re going to be looking at. And then we’re going to sort them out into false positives, meaning it wasn’t really a bad thing, or true positives, meaning it was a bad thing. And now we’re going to go respond to it.
There are many different ways to do this, and we’ll look at it in a couple of different ways that we do our analysis. We can have conditional analysis, heuristic analysis, behavioural analysis, or anomaly analysis. Let’s go through each of these in this lesson. First, conditional analysis. Now, conditional analysis is a simple form of correlation that is performed by a machine using signature detection and rules-based policies. Now, the great thing about using something like this is that it is very clear-cut when you have an alert pop up on your screen as an analyst. Usually it’s because there is some kind of signature or rule that conditional analysis is going to use to generate that alert. And it’s usually in the form of “if this alert,” “if this alert,” “if this alert,” and so on.
So if I have something like “if X and Y or Z happen and X also happens,” then we’re going to create an alert. Now, there is a problem with this approach of using signature-based or rules-based policies. This is because all of these things have to have a rule created, and if you don’t have a rule created, you’re going to be blind to any kind of zero-day or previously unknown TTPs. Also, this type of conditional analysis creates a large number of false positives because these rules are very basic and don’t understand the intricacies of human behavior. And, all too often, these rules can raise an alarm even when nothing is wrong. So this brings us to our second category, which is known as heuristic analysis. A heuristic analysis is a method that uses feature comparisons and likenesses rather than specific signature matching to identify whether the target of observation is really malicious. For example, if I said, “If you see a man and he has brown or white hair, he might be malicious,” That’s a pretty generic rule that would be a signature-based or conditional analysis rule.
And you’re going to have a tonne of false positives because I’m looking for one bad guy who might have robbed the store, but now I’m looking for everybody who is a man and has brown or white hair. Well, I can also go into heuristics, and I can start adding some things that look like that, but not exactly that. So now I’m looking for a man who has brown hair, or maybe something between brown and white, because it wasn’t either brown or white; it was somewhere in between. As we develop this type of heuristic, we may begin to notice things that resemble it but aren’t exactly the same. Now, when you’re using heuristical analysis like this, you’re going to end up using machine learning to help you alert on behaviour that’s similar enough to a signature or rule but isn’t necessarily exact. The advantage of these bad guys is that they are extremely intelligent.
And so if they start realising that if they do this exact pattern, it’s going to flag a sensor based on a signature, they’re going to modify that pattern just a little bit to make it harder to find them. Well, heuristic signatures can find those things by using this. Machine learning. By using machine learning, we’re using a component of AI that enables machines to develop strategies for solving a task given a labelled dataset where features have been manually identified but without further explicit instructions. Now, the reason we use machine learning is because there’s just so much data out there, and we don’t have enough analysts to go through every single alert and try to make all these determinations for us. So by combining machine learning with heuristic analysis, we can start going through and having the machines learn over time what exactly is bad and what exactly is good. And they can do a better job of that than humans can because they can process so much more information faster. Now, this will end up having a lot of false positives.
This can have a lot of false negatives as well. But over time, your heuristics get much, much better when they have a good trained data set and can learn over time and do a better job for you. And this helps our analysts relieve a lot of that workload. Now the third type we have is what’s known as behavioural analysis. When we deal with behavioural analysis, we have a network monitoring system that detects changes in normal operating data sequences and identifies abnormal sequences. So what we’re really looking for with behavioural analysis is to generate an alert whenever anything deviates outside of a defined level of tolerance from a given baseline. Oftentimes, people will call this statistical or profile-based detection.
Essentially, we create a baseline, and anything that’s outside of that, we’re going to flag. So for instance, here on the screen, you can see a basic baseline or a scatter gram of a bunch of different events. And you see that one all the way in the far right corner—that’s the outlier. That one is outside the band of tolerance, and so we want to go investigate it. That’s the idea here with behavioural analysis. Now behavioural analysis is going to generate a lot of false positives for you. So this is something you have to be aware of. It’s also going to create a lot of false negatives. Now this is going to happen until your statistical model is adequately trained and tuned, just like heuristically scans.
They do need to learn, and as they learn more over time, they are going to be better off and do a better job for you as you start figuring out exactly what your baseline is, what your tolerance level is, and what is going to be something that’s outside that baseline. Now the fourth and final type we have is what’s known as anomaly analysis. Now an anomaly analysis is a network monitoring system that uses a baseline of acceptable outcomes or event patterns to identifyevents that fall outside an acceptable range. Now, I know this sounds a lot like behavioural analysis, but there is a difference, and I’m going to get to that in just a second. With anomaly analysis, we’re going to generate an alert for any event or outcome that doesn’t follow a set pattern or rule.
So let me give you a good example of this. Let’s say I’m sending data back and forth. Well, if I’m doing that, I’m probably doing it over the network, and I’m doing it using packets. Well, my engine might check the packet headers or the exchange of packets in a session, and then based on that, if it’s not conforming to the RFC standard that everybody conforms to, I’m going to generate an alert because it’s something that deviates from the strict compliance of that RFC. Now this is the idea of anomaly analysis. It allows you to say, “Hey, this doesn’t meet the standard; this looks weird.” And so if somebody sent something like a “ping of death,” that would get caught by an anomaly analysis because it’s not normal to get a “ping of death” packet because that goes outside of the RFC by using weird sizes inside the packet headers.
Now what is the difference here between an anomaly analysis and a behavioural analysis? Well, with anomaly analysis, this is going to use prescribed patterns like an RFC or an industry standard, something that everybody should be following. Now, behavioural analysis, on the other hand, records expected patterns in relation to the device being monitored. So with anomaly analysis, we are looking at everything that follows a standard. With behavioural analysis, we are setting our own standards based on the observed patterns on that device. So if I create a baseline based on my web server, not any web server, that’s really the difference between behavioural analysis and anomaly analysis. You.
4. Trend Analysis (OBJ 3.1)
Trend analysis. In this lesson, we are going to talk all about trend analysis. Now, trend analysis is the process of detecting patterns within a data set over time and using those patterns to make predictions about future events or better understand past events.
Now, the whole idea here with trend analysis is that we should understand what a baseline looks like, what our system normally looks like, and then what goes outside that trend. If that trend starts going up or down or goes way off to the side, that is something that we want to know because it might mean that either an attack is coming because it’s trending upward or something already happened and that’s why we see this big spike. Now, when we deal with trend analysis, this can enable us to review past events from a new perspective as well. For instance, if I look back over my logs for the last six months and I see there is a big spike from, say, three months ago, I can then go back three months and start investigating that. Now, by going back and looking at those events over this time period, we can see where those patterns are and identify what goes outside the pattern.
This is really important because it is really impossible to identify a trend when you’re just looking at a single logged event. For instance, back in elementary school, when you learned how to graph things on a piece of graph paper, you would go and put a single dot, and you wouldn’t really know what that really meant. But if you put a second dot and then a third dot and a fourth dot, you could start seeing where those trends go over time. And that trend is really important because it starts to tell you a lot more than any single event could on its own. Now, when we deal with trend analysis, there are three different kinds that we can deal with. We have frequency-based trend analysis, volume-based trend analysis, and statistical deviation trend analysis. Let’s talk about each of those as we go through this lesson. The first is frequency-based analysis.
Now, when we do a frequency-based analysis, we establish a baseline for a given metric and then monitor the number of times that occurrence happens over a given period of time. For example, here on the screen, you can see the notable events over time and how the count goes up and down over a given period of time based on what the different events are. For instance, I can see there’s a spike on Access at around 7:00 p.m. at night, and then again around seven or eight in the morning. Those are probably shift changes between two different analysts, and someone has to log on when all the new shifts began and all the old shifts ended. That was the idea there. We can also see a couple of spikes around 9:00 a.m. On the right side of the chart, I see a red one and a yellow one. The red indicates an endpoint. And then shortly after that, we see an increase in the thread as well. This shows us that the endpoints were logging something, and shortly after, we started seeing threats rise as well. And so we start having additional information so that we can start putting all that together as part of our frequency-based analysis.
The next one we want to talk about is volume-based analysis. Now volume-based analysis is going to measure a metric based on the size of something, such as how much disc space is being used, the logfile size that’s being used, or something like that. For example, let’s take a look at our database server’s network utilization. Last week, it was 40 megabytes. This week, it was 800 megabytes. That is a 20-time increase. Now, is that good or is that bad? Well, it could be good, it could be bad—we really don’t know. But it is something that we should identify. This is a unique thing. It’s an anomaly that goes outside of our trend based on our usual volume. Now, if I had multiple weeks to look at this, I could say it’s usually around 30 to 40 to 50 megabytes. This week, it jumped up to $800. Well, that’s something I want to look into because maybe I’m the victim of a data leak and they’re dumping my entire database over to their servers, and that would show a huge volume of network utilization. As a result, we’d like to look into it. That’s where volume-based can really come in handy.
The third type we have is statistical deviation analysis. Now we are going to use the concepts of mean and standard deviation to determine if a data point should be treated as suspicious. When discussing mean, this is a mathematical term. The mean is basically the average. It’s the sum of all the values divided by the number of samples. So, for example, on this chart, you can see the blue line. That is the norm. We expect a one-to-one or two-to-one correlation as we go up the line. Now if we have something that goes far outside of that, for example, the green dot that is showing at 4, 5, and 1, this is something that is way outside the norm. Now what does this graph represent? Well, nothing really. It’s just an example.
But it could represent lots of different things. For example, one of these data points shows us the relationship between the standard users and the privileged users and how many times they’re invoking a process and how they’re running it. For example, having one user account but logging into four and a half different systems would be unusual. If I had one user account and I was logged into one system, that would be normal. The example on the screen here doesn’t really matter what those axes are; it’s just showing you the concept. When we start seeing something that is outside the standard mean or outside the average, that’s when we’re going to flag it. And that’s called a statistical deviation analysis. Now one thing to keep in mind when you’re dealing with trend analysis is that your trend analysis is going to be very dependent on what metrics are used for your baseline and your measurement. If you’re not measuring the right things, you can miss bad things that are happening if you are measuring the right things, but if you don’t watch those things over time, you can miss other things as well.
So you have to make sure you pick exactly what you want to measure and, given the resources you have, which ones you’re going to actually look at and analyse using your analysts. because you only have so many people and so much time. So when it comes to trend analysis, you have to figure out what you’re going to measure. And there are lots of things you could measure. You could measure the number of alerts and the number of incidents and see how these trend over time. You might look at the time to respond and how quickly it is going up or down. Are you getting better at doing responses?How about the number of network or host metrics that there are? You might be looking at network bandwidth, storage volume, log files, number of logons, or the number of active ports. There are tonnes of different metrics you can choose from. You might also look at training and education, and we’ve talked about this before. How well informed is your staff in regards to cyber threats? Do they know what phishing emails look like? How many times have you given these programmes in a year? How many people have gone through this? How did they complete their grade assessments?
Have you done pen tests to see if they fall for these types of tricks? How about compliance? How well are your systems complying with your baseline? Are they all patched? Are they all scanned? Is everyone’s antivirus software up to date? How many of them are up? How many of them are down? All of this is data you could be looking at. And then finally, how about your external threat levels? What does the external threat level look like right now in the world? As you keep reading about different cyber threats that are out there, do they apply to you? Are they all going after Windows systems but you’re using Linux, or are you using Linux and they’re going after Linux systems? Are you in an industry right now where everybody seems to be attacking?
For instance, maybe you work for a credit card processor or a bank, and you see that this industry has been targeted week after week, month after month, and year after year. Or maybe you work for a software company and you realise nobody’s really going after them, so you’re not as worried that the external threat level does make a difference, and you have to keep that in mind, too. Now, one other thing I want to talk about when we talk about trend analysis is that trend analysis can help you figure out if you’re the victim of a sparse attack. Now, what is a sparse attack? Well, sometimes attackers will use a sparse attack technique to bury their attacks within the network noise. For instance, one of the last organisations I worked at had about a million endpoints, or a million different computers on our network. That is a lot of noise and a lot of data inside our ears. And so if somebody went and did one bad thing today and then didn’t do anything bad again for three months from now, it’s going to be hard for us to see that. Whereas if they were doing a bunch of things today, we would probably catch them. So it’s a matter of how much they can do at once.
And so if attackers can take their time and be patient, they can do one thing today, one thing next week, and one thing three weeks from now and bury themselves in the noise of the network. Now, the reason these false attacks work for an attacker is because a lot of times we will end up tuning down the sensitivity of our systems because we get so many false positives. Let’s take, for example, somebody trying to guess your password. Now, if I log in today with one password and it doesn’t work, and I try again tomorrow and it doesn’t work, and I try again the next day and it doesn’t work, and I do that every single day and I only try one time each day, would you catch that? Most likely not, because most systems are tuned to only catch three failed login attempts within 30 minutes or three failed login attempts within an hour. And so if I do one per day for the next six months until I guess your password, I’ll probably get there eventually.
That’s the idea here when we talk about a sparse attack. Now, even though that sparse attack wouldn’t be caught based on my login attempts, if I finally did get your password and I started downloading all your files, trend analysis may help you catch me. Because as I’m downloading all that stuff now, your system, which is normally used to downloading maybe 100 megabytes per day, is downloading a gigabyte per day. That would be something that could identify it. The successful sparse attack was identified using trend analysis. Now, the last thing we need to mention in this discussion of trend analysis is narrative-based threat awareness and intelligence. Now, this is a form of trend analysis that’s reported in long form to describe a common attack vector seen over time. For example, over time, we started seeing that bad people were using Internet Relay Chat as a way to command and control their botnets over time.
Security researchers kept seeing this, they analysed the trends, and they wrote a report on this known as a narrative-based threat assessment that said, “Hey, Internet RelayChat is being used as command and control.” So if you don’t need IRC inside your organization, you should block it. And so a lot of people blocked it; the firewall, well, because of that, attackers saw that it was blocked, and they ended up switching to another mechanism, which is using SSL tunnels over HTTP. That way, they can blend in with the rest of the network. Again, this is just an example of how, over time, you can see lots of these little events and then put them together as a trend and say this is a trend. Bad guys and bad people are using this as a C-two mechanism. So therefore, we should block it. Now that they have moved and changed, we have to move and change with them. And we need to be able to tell the difference between legitimate HTTP traffic and malicious command and control bots that are using it. And again, trend analysis can help us do that.
5. Rule and Query Writing (OBJ 3.1)
Rule and query writing. Now, as you’re going through all of your data, it comes to you in lots of different forms, and if you’re just looking through log data, it’s not going to be that useful to you. So instead of writing rules to do correlation, we write queries to search information to get the data that we need. Now, to do this, we can do it using either correlation rules or search queries. Let’s talk about what each of these are. First, what is correlation? Well, correlation is the interpretation of the relationship between individual data points to diagnose incidents of significance to your security team. So if I told you it’s 32 degrees, that tells you nothing, but if I say it’s 32 degrees, that tells you a little bit more. If I say it’s 32 degrees and it’s lowering, that tells you even more because that’s telling you that the weather is getting colder and we’re going to have ice soon because it’s going to start freezing.
These are the ideas of taking different data points and giving them context and correlation. To do this in a seam, we’ll use a seam correlation rule. This is a statement that matches certain conditions as expressed using logical expressions. We can use and or operators such as the less than, greater than, and contain symbols. Now that we have that logic, we can create a rule using it. For example, let’s say I wanted to create a rule that could be created to send an alert if multiple user login failures occur within 1 hour from a single account. What might that look like? Well, we’re going to make it look like something like this. We’re going to have an error log failure greater than three. If there are more than three failed login attempts, And the second condition, “Login failure for this user,” says there was a login failure for this user and the duration was less than 1 hour.
So what this says is that this rule for these am says if I have three login failures from one user within an hour, send an alert. Now, that works great. And this is what we call a correlation rule. But correlation rules depend on normalized data. If you have data from all sorts of different data sets and it’s not parsed and normalized first, you can’t compare it. And so we need to make sure that normalization happens. Now again, that normalization and that parsing are going to help you get context. For instance, what if I had this log file that had all these different IP addresses? Is that useful? Well, maybe, or maybe not. I need to correlate it and normalize it first so that I understand the context of it by knowing that these were IP addresses from a firewall that tells me what’s coming in and out of my network; that’s helpful. Knowing if they’re statically or dynamically assigned is helpful. making sure the time on them matches all the other systems using UTC across my network.
That’s helpful. All of that correlates together to give me more information. Now, when we create these correlation rules, one of the important things to remember is that these rules match the data as it’s ingested into your scene. And this requires the data to stay in memory as persistent state data while you’re trying to process it. So for the example I gave earlier of the three login failures within an hour, that means I have to maintain all the login data for up to 1 hour in memory. That stores a lot of information on my server in memory, and that can end up slowing down the system or causing it to crash if I have a large network.
So there has to be a better way to conduct long-term information searches. Yes, there is. It’s known as a scene query. Now a scene query is going to extract records from among all the data stored for review or to show it as a visualization. Now, when we do this, we are going to go through the data store on the scene and find all the matching entries. Now, one of the big differences is that, with the correlation rules, it will flag an alert immediately with a scene query. It waits until you run the query and look for that information.
Now, when I look at a query, there are a couple of things we’re going to look at. We are going to select some fields where some conditions exist, sorted by some kind of field. And so, what does this look like? Well, let’s return to the last example we just had. Assume we want to select the user field where the error of login failure is greater than three and the duration of the login failure for that user is less than one hour. If all those conditions are met, we are going to select that user.
Where are they going to provide that sorted list to us based on the date that it happened and the time that it happened? Now, this can go back a few minutes, hours, months, or even years. It depends on what you set your filters for. In this particular example, I am searching the entire database—everything that’s in my file for as far back as it goes. And again, this is just another way to do this. And this is a query as opposed to a rule, because the rule is sitting there inside the where, but the query is what I’m searching, which is the were part of this, where I’m searching it, all of those users, and how I’m going to display it when I’m sorting it by.