1. Business Continuity and Disaster Recovery
We spent a very long, very well spent, and worthwhile amount of time on networking concepts and networking vulnerabilities. We talked about OSI, wireless, Bluetooth, network devices, network security, and protocol. Of course, networking is all about data. So let’s talk about managing data, whether it’s on a network or not. First and foremost, consider data classification. Not all data is meant for all eyes, right? Alternatively, all ears. There are several different classification levels, and you may even have more. If you’re in the military, you’ll, of course, see others. You’ll see classified, secret, and top secret. But in the private sector, we tend to call things public, confidential, private, or sensitive. The main thing is that it doesn’t really matter if we’re going to call this confidential or sensitive; the main thing is that there is some procedure in place for identifying what does constitute confidential or sensitive.
Confidential information would generally be personally identifiable employee information, such as social security numbers, health information, salaries, and other information that not everyone should know. Sensitive is not so much personally identifiable as things that we really shouldn’t let the general public see, like sales figures or just internal operational stuff that’s really not meant for just everybody. And then you can have flat-out private, which can really refer to an individual. So while confidential, well, we might say that your salary will be confidential and your health information will be confidential. We can also have it just downright private, which is just for this individual. For example, when I was working around health clinics, only certain people, including the immediate health provider for that patient, were allowed to see that really private information, that health history.
There could be other confidential information that was not “exactly what dosages and what’s your diagnosis,” but that a health official could see. But if it’s really private and just specific to an individual, then it should only be viewable by the individual’s immediate healthcare provider and not some government official or a hospital administrator. So you have to determine now what the categorization and classification of your data are. If you’re in the military, you’ll also have things classified as “secret” and “top secret.” So, as an IS auditor, check to see if there is a policy that describes how to classify data and if it is being classified, and different types of data should be handled and stored differently. Some of it, for example, if it’s truly private, should be encrypted. If it’s confidential, it certainly should have access controls on it sensitive.
Maybe you don’t have so many access controls, but you make sure that nobody accidentally attaches a spreadsheet to an email and sends it outside of your organization. Whereas with “public,” that is meant to be viewed by the general public. Even so, you don’t want people tampering with it; you want it to still maintain its integrity, but you also want it to be completely viewable. So when we talk about these different classification levels, we see that there’s personal confidential, corporate confidential, sensitive, public, and client confidential. Like, maybe it’s information about a client or a customer, or it’s private, or it’s a trade secret. Just see what sorts of classifications are being used, how they’re being defined, and how they’re being protected. If it’s confidential, we’ll want to store it. We’ll want to store it carefully.
We’ll make sure that the hard drives are protected if we take them out. antistatic bags, magnetic media. You want to stack it up; don’t lay it flat. Keep it away from anything that could inadvertently wipe it out, like fans or motors, and keep it in acid-free containers so that if it’s stored for a long period of time, nothing happens to it. Floppy disks, which you may still have, but also any magnetic media that cannot be written on with, say, a pen. I mean, some people have taken really big sensitive floppies and written hard on them, and now you’ve damaged them. Use felt-tip pens, and that’s really true. It’s also too easy to damage CDs and DVDs; however, don’t gob them or expose them to anything for an extended period of time. And even long exposure to water—even though they’re kind of plastic-encased—can get in and destroy them. Of course, keep all of this stuff out of direct sunlight, excessive moisture, and dusty areas when transporting and storing confidential data. Try not to get liquid on them. Avoid magnetic fields and electronic devices such as monitors, speakers, fans, and motors that could generate a field. If it’s optical media, you don’t worry about that so much. But if it’s magnetic media, that’s a whole different thing.
Try not to transport during a strong magnetic storm. Well, I mean, you need to protect it because it’s possible that you’ll have static and you’ll have lightning strikes or whatever. We just want to protect our media, especially if it’s magnetic, from any electromagnetic interference or force. And then, of course, if the manufacturer has any recommendations for humidity, temperature, and ways of storing, you want to follow all those things. You must have a procedure in place for disposing of this waste. Typically, it’s shredding. I remember working in a financial firm, where they had special trash bins that were locked and meant specifically for shredding. And so anything, even if it wasn’t particularly confidential, if it was even the least bit sensitive, if it was just internal operational stuff, we threw it right in that bin, and it went straight to the shredder.
So make sure that there are proper disposal procedures for printed stuff, magnetic media, hard drives, and other stuff too. There are organisations that will give away their computers to charity. But what do you do with all the data on the hard drive? Some of them simply won’t include hard drives. They’ll replace them. It’s good enough to degauss the hard drive for some of them. Some of them are so sensitive that the only thing they’ll do is destroy the hard drive to absolutely make sure that no one is getting that last little bit of magnetic imprint off that drive. So how do we evaluate the confidential information process and procedure? The big thing is we want to see their documentation. How do you intend to treat this stuff, and does it comply with any legal, regulatory, or contractual requirement? We want to verify that they’re in compliance, and we want to make sure that they have a method for classifying data and that they are classifying the data and that they’re handling classified data in different ways accordingly, storing it, transporting it, and protecting it.
We want to make sure that media, of course, is labelled such that we know its classification, its creation date, and any other labeling. And we want to make sure that everybody knows how to handle this, that they have awareness, and that they’re not just carelessly leaving things around, because I really have seen people carelessly leave stuff around that was sensitive or confidential. They leave it waiting to be destroyed, and it’s sitting there by a stairwell out the back on the warehouse dock, or whatever. And that’s because somebody wasn’t trained well enough or just wasn’t following procedure. And with that, that is lesson six.
2. Fault Tolerance
Some organisations have systems that are so important that they must be able to fail over. In the event that one system fails, another one automatically takes over. This is known as fault tolerance. Of course, it generally costs money—money in hardware, money in operating system licenses, money in infrastructure. But sometimes you just can’t afford to have that website go down because, by the time we restore it, we’ve lost X amount of money in lost business and people surf somewhere else. Or we can’t afford to have that email system go down, and we certainly can’t afford to have that database go down. So any major operating system vendor will have the ability to create fault-tolerant systems. And it’s basically just that I have one server here and one server there.
The two servers might share some data between them, and if the primary goes down, the secondary notices and just takes over, so it’s seamless. The failover occurs in less than a second. Or this one can be the primary for, maybe, email, and this one can be the primary for a database. And then between the two of them, they’ve got the email database and the SQL database here. And if this one dies, this takes over for both. Alternatively, if this one dies, this one takes over for both. In the concept of fault tolerance or clustering, in some cases we don’t have shared data like, say, in a web server. Maybe we have an email web-based front end, or maybe we just have a web server, and we’re just going to have multiple web servers all here like this. And they’re going to all have their own IP address.
They’re all going to share a virtual IP address. They don’t really have any shared data because they don’t need to. They then discuss the possibility of bringing a database back here. And clients just go to each one, and between the four of them, or the two of them, or the 30 of them, or however many you have, they round robin-divide the load, or they divide the load by some other mechanism. And so if one gets overwhelmed, the denial of service attack dies, while the others just keep on going. So we can have this kind of network load and fault tolerance as well. So we can either have shared data between servers that take over for each other or not have shared data. It’s okay, we’ve got another service or database back here, and all these front ends are just talking to it and dividing up the client load. from the client’s perspective. These guys all have their own IP address, but they also have a virtual IP address, and the clients just go to the virtual IP address and don’t know any different.
And so with this whole concept of fault tolerance, we can have fault tolerance in servers, switches, router services on servers, and virtual machines on servers. So you can have a huge infrastructure of fault tolerance. Like I know in really big firms, one person’s little, tiny desktop right here is going to multiple switches, multiple routers, and whole different sites. I’m in a huge infrastructure just so one person can still get that same desktop, no matter where they are, no matter what happens underneath any of the networks, any of the servers, anything—they can still get that same user experience. Again, it’s expensive, and it depends on how much you really value or need to have services, people, servers, or network infrastructure on and available all the time. What do you want to see as an IS auditor? OK, let me see your network diagram. Explain your entire fault tolerance to me. If you’re a big financial firm, you probably have two parallel sites fully operational, and they will, on a regular basis, swap over completely. And while we’re in full production, just to make sure Or we’ll have one completely hot and redundant standby site, and everyone will be working here. And then we’ll have an exercise where we completely swap over.
And for three months, you work on this side, and then, in the next three months, you completely swap over. And this way, you know for a fact that you have complete fault tolerance. Again, it’s really expensive, but it really depends—does the app or the business require it? And some businesses really do require it. With some operating systems, like Windows Server 2012 and Server 2008, there is built-in fault tolerance. And so it’s just a matter of more hardware and more licensing. You can have fault tolerance if you want fault-tolerant servers or fault-tolerant virtual machines. It’s not quite so expensive on a smaller, more scaled-down version, or you can have massive, complete mirrors of each other’s whole infrastructures. So, once you’ve created backups, you’ll need to know what your backup retention policy is—how long will you keep tapes? How far are you going to storm, and how long are you required to keep it?
So when you’re an IS auditor, you need to look at what data the organisation should maintain, for how long, how it should be stored, who’s authorised to delete it and when, and what the penalties are for violating this. A lot of this is regulatory, but these are the questions you need to ask as the IS auditor. If we need to restore a backup, we have backups just in case we need to restore them. As I’ve said before, don’t simply be able to grab the tapes. You need to have the tape drives necessary for those tapes. So make sure that you can replace not just the servers and the switches and routers and the cabling but also the tape drives or whatever it is that you used, probably tape drives. So we need to have recovery practises in case an OS fails or there’s physical damage to the hardware. in case there’s logical damage to the hardware. So it’s not like the hardware is destroyed, but it has to be reconfigured, or the data has been accidentally overwritten or deleted. And I want to see, as an IS auditor, how you’re going to deal with all of that. And do you know how to implement it? It’s one thing to have a written policy, but are people trained to implement it? And have you practised periodically?
So, for evaluating the adequacy of a backup and restore system, let’s make sure you’ve got a schedule. Let’s make sure that wherever you’re taking these backups, if it’s to an offsite library, that we know where it is, that it’s been thoroughly documented, that it’s controlled, and maybe you even use a bonded messenger to go and take your tapes offsite. We verify that the restores have actually been tested and that the backed-up data has maintained its integrity even when it’s restored. And that’s the media. You don’t keep reusing the same tape over and over again. You rotate through the tapes, eventually phasing out the worn-out ones. These are the things you’re going to be looking for as an IS auditor. The next thing we’re going to talk about is business continuity and disaster recovery regulations.
3. Business Continuity and Disaster Recovery Regulations
We were talking about business continuity, disaster recovery, and backups and restores. Let’s take a look at some of the regulations that are involved with disaster recovery. You are not expected to memorise these, but you should know that they exist. Sources for business continuity or disaster recovery We’ve got the Business Continuity Institute. We’ve got the U.S. National Fire Protection Agency. Agency? The Health Insurance Portability and Accountability Act is now in effect. We have control objectives for HIPAA. In COBIT, we’ve got the Disaster Recovery Institute, the international DRII, and the US Federal Emergency Management Association. FEMA So these are agencies or regulations that we can turn to for disaster recovery and business continuity, information, and guidance as we are trying to figure out how to recover. One thing we can do is something called a “business impact analysis,” or BIA. And with a business impact analysis, we’re trying to separate the organisational functions.
Are they critical, or are they non-critical? We obviously try to get the critical functions back online in a disaster, and those are the things you want to get online in a couple of hours, and then the slightly less critical things will need to get online in 24 hours, 72 hours, a week, two weeks, and finally a month. And so we have these sorts of levels of criticality in our business impact analysis. How critical is this function so that we can give it priority in disaster recovery? We’ve also got a concept called “Recovery Point.” RPL and Recovery Time Objective RTO The recovery point objective is: how much do we need to recover, and where do we need to get ourselves back to? And this is, of course, what we’ll do in stages. So the Recovery Point objective immediately is to get this database, that functionality, and whatever else online. The goal of the recovery time is to determine how quickly we must complete the task. in a half hour, in 4 hours, in 72 hours, whatever. So you have this point of recovery and how long we give ourselves to get it done, and then we’re to the next, less critical stage again, another RPL, and another RTO. And so we do our recovery in stages like this. As we’re developing the BCP and the DRP, we need to make sure, as an auditor, that we understand the key business processes.
Now if you’re an IT person, go ask the department people themselves: what do you need to get back online? You’ll be surprised. They might need post-its, notepads, and pencils just as much as access to a database. Ask them how they do their day-to-day operations. Don’t leave it up to it to determine what people need to have. Ask them in the departments: if we had to walk away tomorrow, what do I need to give you to start functioning immediately? So you need to understand their process so that you know what’s important to them, and you need to understand all of the resources that you have as an IS team. and we need to establish what’s critical. That’s where you’ll ask management what’s critical and what the criticality of the information is. You need to determine what the business impact is if we lose this or that. If we don’t have access to some facilities, maybe it’s not as important as if we don’t have access to this or that service. Then prioritise all of your information systems that support the different business processes and figure out what the strategies are for supporting these things.
Have a disaster recovery plan (DRP). Have a plan that allows the business functions to operate so that the ITDP supports the business. BCP tests the plans, and you have to periodically review the plans and train people on these plans.So how do we test something? Well, there are several ways to test your plans. First is a structured walkthrough. This is where you just get the department heads together and you just go through and say, “Okay, folks, in a disaster we’re going to do this.” This. So it’s just on the tabletop. Here’s what we’re going to do. And we’ll maybe have a checklist. Okay, here’s the checklist; we’ll go over it together. That’s fine, but that’s only to orient the managers. When there’s a real disaster, nobody’s going to know what to do. You actually need to train people, and it needs to be simple. And you have to have contingencies because, in a disaster, people are cut off; they can’t fly back, they can’t drive in, they’re relocating their families, they’re missing in action, whatever.
You have to account for not only the missing data, the missing equipment, and the missing systems but also the missing personnel. So how are you going to function with half your staff or three-quarters of your staff unavailable? What you’re going to do So it’s not enough just to do a structured walkthrough; that’s just a start, but some folks just do that. You can also do a simulation, which is good practise for your staff and the people who are working. You can do the simulation. Of course, simulation is never as real as the real thing. If you could afford it, you could create a parallel system, have people practise like a mock up, disrupt the parallel system, and have people practise that’s even closer. Of course, that takes time, energy, and money. And you could do a full interrupt where you absolutely like cutting off and cutting over. You fully interrupt production right in the middle of production.The largest enterprises and the ones that are most serious about protecting their data, their functionality, and their processes will periodically do full interrupts because it will actually show up weaknesses nobody ever expected.
So these are all possible testing methods. The business has to determine what’s appropriate for it. You, as the IS auditor, have to see, okay, what is your testing method, have you done it, and what were the results of the testing method? There’s always something to be learned and improved upon from the testing method. In some cases, you just leave it up to the insurance company to cover certain things. We can’t deal with it; we’ll have insurance, and we can have insurance for equipment, software, hardware, reconstruction, whatever expenses, business losses and interruptions, document and record errors, and transportation in the media, making sure that everything is still valid and has integrity.
So you can have insurance for all kinds of things that you just decide that you can’t truly protect. Now if you’re in the middle of a disaster and you have to pick up and go, or, like people driving into work, the place is taped off and we can’t go in there, what do we do now? You have to have a communications plan in place, a place for people to go and work, and a place for people to assemble. So one thing I did with one organisation is we had a phone tree, basically, and all communication went through a phone tree. And so there was a clear way: one person called two people or five people, and they called people in their department who called their coworkers. And so we kind of distributed the dissemination of the information, but it clearly went to a single point. Sometimes you have to go to an alternate site, and again, it’s how much money do people want to spend and how critical is this? A hot site is completely set up. Everything is completely redundant.
You simply pick up and go, failing over your system to it. It’s like everything is completely set to go—totally mirrored with the same, or nearly the same, functionality and capabilities. A warm site is almost like this. You’ve got infrastructure in place; you just have to basically pick up the data, the people, and maybe some equipment and go. A cold site is basically four walls, electricity, and carpet. And you have to actually go and completely set it up. Or you can have a reciprocal agreement like newspapers do. It’s where we can’t work in our facilities. So can we come and use your printing presses to print off our additions? And of course, with the reciprocal agreement, the difficulty is: do they have the capacity to handle as much as you need, and also, what kind of contention will there be between your staff and their staff? But these are all options for alternate sites.
So when we are evaluating the BCP and the DRP, we’re looking at business continuity. Are they prepared to keep the business running, and how will they manage an alternate site if they have one? Have they tested their policies? What is their recovery plan? Are they backing up okay? Have they tested it? Are all backup procedures being followed? Do they have recovery plans for everything? systems, equipment, software, and data? And we want to make sure that the plan covers all types of disasters, and anything that’s residual, of course, we can always buy insurance for. And the disasters are not just for data; they’re for systems, for functionality, for processes, and for personnel. So these are all the things that the auditor wants to know that the business has in place. And that is the end.