CompTIA CASP+ CAS-004 – Data Security (Domain 1) Part 2
February 11, 2023

3. Data Classification (OBJ 1.4)

Data classification occurs during the data creation stage of your data lifecycle. Now, data classification is the process of applying confidentiality and privacy labels to a given piece of information. I personally find that the best way to think about this is if you think of any kind of military movie or spy movie that you might have watched in the past, somebody might have a folder, and on the outside of that folder, you see a label such as Top Secret. This classification label indicates case. The contents of that folder contains Top Secret information and therefore it needs to be protected using a certain type of controls to keep it safe from prying eyes. This is what classification labels or tags are signifying when you see something like Top Secret shown on a folder. Now, within our networks, the same thing happens, but we use electronic mechanisms for that classification and labeling of all of our data. You’re going to find that there’s also different classification schemes that can be used depending on the type of network and the organization that you’re working for.

The most common scheme used is the military classification scheme. And this utilizes the classification labels of unclassified classified, confidential, secret and top secret. The unclassified label indicates that there are no restrictions on viewing that data, and it presents no risk to our organization if that information was disclosed to the public at large. For example, many of the Army Field manuals are classified as Unclassified information. This means that anyone from around the world can go to the website, download it, read it, and learn about how the army does its business by reading these field manuals. The next classification label we have is known as Classified. Now, Classified is really a general bucket that contains some other levels of information within it, things like Confidential, Secret, and Top Secret classifications.

Any data that’s labeled as Classified is considered a controlled piece of data where viewing of that data is restricted to authorized persons within the owner’s organization or to third parties under a nondisclosure agreement. The lowest level of classified data we have in the military Schema, is known as Confidential. Now, Confidential data is highly sensitive data that’s only for viewing by approved persons within the organization, or possibly those who are trusted under an NDA. For example, the position of a Navy ship in the ocean when it’s deployed might be classified as Confidential, depending on the mission that it’s performing. Again, remember, the confidential information here is considered to be the lowest level inside the classified realm for the military system.

The next level of classified data we have is known as Secret. Secret Information or data is going to be valuable and therefore it has to be protected by severely restricting its viewing. If you work for the military, for example, they have certain buildings where you can go and view secret information by using a specific network known as Sipernet to view, process, and store this secret information. Because these networks are only used to process secret data, they are not even connected to the Internet. This is because the Internet is considered to be unclassified, and we don’t want this data to get onto the Internet.

Therefore, the secret network has more protections in place, such as higher levels of encryption, and it costs more to build and operate the secret network. But that’s okay because the data being stored here is more valuable, so it’s worth spending additional money to protect it better. The highest level of security in the military classification system is known as Top Secret. Now, Top Secret information, or Data, is any type of information that would have grave danger or grave consequences if it was inadvertently disclosed. Again, let’s consider the military and see how different types of information might be used here. Let’s pretend we work for a top secret military organization that is responsible for finding a bad guy located somewhere in the world.

Now, the methods used to find that bad guy, such as embedding a spy in their organization or using some technical gadget to locate the bad guy, that might be highly classified. So we’re going to label that information as Top Secret. Now, once we find out where the bad guy is, we need to tell that information to the soldiers who can drive over and capture the bad guy. These soldiers, they have a need to know where the bad guy is, but they don’t necessarily need to know how the super secret spy agencies actually found out the location of that bad guy. So while the methods used to find the bad guy might be top secret, his actual GPS coordinates of where he’s located might be classified as secret. So we can give that to the soldiers who are going to perform the mission. You see, different pieces of information need to be protected in different ways and at different times.

For example, if we were going to be months before this raid to capture this bad guy, the location may still be considered top secret because we want to limit the number of people who can access that location. But if we’re 24 hours from sending the soldiers in to get that guy, we may have that information downgraded to secret, because it needs to be shared with the helicopter pilots and the soldiers who are going to go on that raid and capture the bad guy. Usually in the military system, they’re going to have a separate network for each of the different classification levels. So it’s typical for a military member to have multiple computers at their desk. They may have one for unclassified things like email and surfing the Internet, a second one for working on secret data and a third one for working on top secret data.

This provides physical separation between the three data classifications and allows each network to be more or less protected with the appropriate data protection controls based on their specific security classification level. The higher the classification level of the data, the more restrictions that are going to be placed on those systems and the fewer people who are going to be able to access them. Now, a lot of organizations don’t use this military classification scheme, though. If you work in the commercial sector, like in a bank, an insurance company, a college, or a hospital, you might see information labeled using different classification schemes. Things that contain labels like public, private, internal, restricted, and confidential.

This commercial system is a bit simpler to use because, unlike the military system, most of these organizations are still going to rely on a single computer network that handles all of these different classifications on the same machine. And they’re going to rely on logical isolations and protections instead of a physical separation. For example, if you’re using restricted or confidential data, it may use a different or higher level of encryption than you’re going to use for private or internal data. Or maybe the system is going to be configured using different types of access control rights based on the level of classification being used. And again, this all depends on how you want to set it up. The exact methods here that you’re going to use to protect the different classification levels of data is really going to be left up to your organization for the most part.

 But there are some organizations that are going to be directed by law about how they’re going to protect the data they control. For example, the military has certain laws that direct them with the types of protections they have to put in place for the different classification levels. And there’s also going to be laws that prevent the disclosure of classified information too. So if you work for the government or the military and you have access to the secret or top secret information, you cannot tell anybody about that data. Or you can face fines, imprisonment, or even the death penalty in some rare cases. For example, if a spy attempts to steal confidential data, that would be considered treason and they could be put to death under the current US law.

4. Labeling and Tagging (OBJ 1.4)

How is data classification actually applied to the data that it’s designed to protect? Well, this is done using labeling and tagging. Now, data labels can be applied either manually or automatically to your data, depending on how your systems are going to be configured. In a lot of systems, it’s going to be set up automatically based on a list of certain terms or words known as dirty word lists. Now, let’s pretend I was going to send you an email and I wrote the word bazooka in it. It’s the word bazooka was on my list of secret terms. Then the email would automatically be labeled as a secret email because it contains a secret word bazooka. Unfortunately, this automatic classification labeling isn’t really the most effective way of doing things, though, because it doesn’t know if I meant the word bazooka like the weapon, or bazooka like the chewing gum.

 And if I was talking about the chewing gum, I probably didn’t need that email to be classified as secret. So the second option is to use manual labeling. Now, manual labeling occurs when the end user actually types up the data and then puts a classification label on it by adding a text based label to the data. For example, in the military, they’re going to label each and every document in the header and the footer, and they’re going to say, is this unclassified secret or top secret? And as appropriate, they’re going to label that and type it up in their email, their document, or their PowerPoint. Then everybody who sees that will know the classification level assigned to that data, and they’ll protect the associated file to that level.

 In addition to labeling documents with their classification label, we also need to label them with the requirements for declassifying that data that’s contained within them. Now, declassification is the process of downgrading a classified piece of data or information down to the unclassified level. This accounts for the data throughout its lifecycle, going from cradle to grave. When we create a document, we need to classify it and maintain the appropriate levels of protection for that document throughout its entire life cycle. At some point in the future, though, that document and the data it contains can become declassified, either due to enough time passing or other conditions that are being met.

At that point, the data may undergo a declassification process and then it will be downgraded because this information no longer requires the additional security protections that are provided by its higher classification level. Let me give you a scenario that illustrates why we need to declassify things. For instance, let’s say you were part of the military planning team who is playing the invasion of Normandy. Back in 1944, in the middle of World War II, operation Overlord, which was the code word for the invasion of Normandy, was highly classified to ensure the Allies could launch a surprise attack against the Axis powers. Now, any document that was created as part of this plan was labeled under the top secret classification label.

 This information and data was all highly classified and highly protected. In fact, they created a subcategory underneath the topsecret classification and labeled it as Bi G-O-T. This stood for the British Invasion of German Occupied territories back during the war. It made a lot of sense to keep all this data highly classified. But once the invasion was done, did we need it to be classified anymore? Well, maybe that really becomes a decision for the classifiers to make. Now, if I was one of those classifiers at the time, I would have kept it classified at least until the end of the war. Now here we are over 75 years later, and the war is over. Do you think Bgot documents still need to be classified? Well, no, of course they don’t.

 In fact, they aren’t. You can open up your web browser right now and search for the British Invasion of German Occupied territory or Bigot, and you’re going to find all the plans and you can read through every single one of them. You can go and find them online for all the different battle plans, the different orders that were put out, and all the information of the different technologies that we were going to use during that invasion. You see, all of that data used to be highly classified, but today it is all considered unclassified 75 plus years later. This is because over time, the technology that we had back in World War II became ancient technology. And as we created better technologies and we had much better tools and much better techniques, those older ones were able to be declassified and we didn’t need to protect them anymore.

By doing this, we can free up resources that were used to protect this data and use them to protect more important and more current data and information. And by declassifying these plans down to the unclassified level, they could be now shared in museums and on the public Internet. So anyone who wants to go and look at them and read them publicly can study them and learn from them. Data classifications aren’t just a single label, though they’re often going to be combined with tags too. These tags can be added based on specific data types that are going to be protected. A data tag is used to identify a piece of data under a subcategory or classification. So for example, the Bigot tag was used under the top secret classification label to indicate that this data was related to the British Invasion of German occupied territory.

That was a classification tag that was used back in World War II. But these days we have different classification tags. For example, under the Unclassified classification label, we have tags like PII, personally Identifiable Information, SPI, Sensitive Personal Information, phi Personal Health Information, or even Financial Data and Information. All these are tags that are technically unclassified, but they should be treated with a little bit more care. And so these are subcategories and tags that are going to be applied to the data or information to indicate the level of protection they’re going to require. This is because we don’t want this type of information getting out and being read by just anyone. If I have your medical record, for example, that information is not necessarily top secret or secret or even confidential, but it should still be protected and not posted on a public website.

 Therefore, I’m going to tag that as phi or personal health information and give you some additional data protections. Even under our classified data labels, we also have certain data tags that we use to indicate higher levels of protection. For example, if you look at the top secret classification level, you’re going to find tags like Si, Tk and HCs. Si stands for Special Intelligence. Tk stands for Talent Keyhole, which is used for data that’s gained through satellite intelligence. And HCs stands for Human or Human Intelligence Control System. For the exam, you don’t need to know these specific caveats or tags under the Topsecret classification.

 But I wanted to ensure you understood that tags can exist at all different classification levels as well as at the unclassified level to protect your data using these different tags. There’s a lot of different solutions out there. For example, if you happen to use Microsoft’s Data Loss Prevention or DLP solution, they have over 70 sensitive information types, including things like PII, SPI and phi underneath the unclassified classification category. So as you can see, it isn’t just enough to know that a piece of data is unclassified. You also need to know does it have a subcategory or tag underneath that classification level? When you’re designing your controls and protections for the different types of data on your network, another thing you need to consider is the actual format of your data.

 The data format is the way information is going to be organized into preset structures or specifications. The two main types of data formats are structured data and unstructured data. Structured data adheres to a predefined data model. For example, if you have a CSV file, also known as a comma separated value list, this tells you this data is in a specific format. If I had a list that was exported and contains a person’s name, their street address and their phone number, then I would expect it to be in the format of Jason Dion comma one, two, three, main street comma 5551-1234. Now, this list might have 10,000 rows of information, but all of them are going to follow the same format with the first thing in front of the first comma being the person’s name, the second thing being their address and the third thing being their phone number.

 Because this is a predefined format and predefined structure that we’re using inside this structured data format. Now, unstructured data is any data that is not predefined through a data model. This type of data can be human generated or machine generated, but it usually takes the form of something like a PowerPoint slide, a Word document, an email, a text file, a chat log really. Any type of data can be entered and saved into a computer as unstructured data. This allows me to just type things into a computer in any order I want. Because of this, different systems and different classification mechanisms have to be set up to be able to understand these different data types and different data formats.

Leave a Reply

How It Works

img
Step 1. Choose Exam
on ExamLabs
Download IT Exams Questions & Answers
img
Step 2. Open Exam with
Avanset Exam Simulator
Press here to download VCE Exam Simulator that simulates real exam environment
img
Step 3. Study
& Pass
IT Exams Anywhere, Anytime!