Eating Security

15 September, 2009

MySQL replication on RHEL

I recently configured MySQL for replication after first enabling SSL connections between the two systems that would be involved with replication. I have to say that MySQL documentation is excellent and all these notes are simply based on what is available on the MySQL site. I have included links to as many of the relevant sections of the documentation as possible.

For reference, here is the MySQL manual on enabling SSL: 5.5.7.2. Using SSL Connections

Before beginning, it is a good idea to create a directory for the SSL output files and make sure all the files end up there.

MySQL’s RHEL5 packages from mysql.com support SSL by default, but to check you can run:

$ mysqld --ssl --help
mysqld  Ver 5.0.67-community-log for redhat-linux-gnu on i686 (MySQL Community Edition (GPL))Copyright (C) 2000 MySQL AB, by Monty and others
This software comes with ABSOLUTELY NO WARRANTY. This is free software,
and you are welcome to modify and redistribute it under the GPL license

Starts the MySQL database server

Usage: mysqld [OPTIONS]

For more help options (several pages), use mysqld --verbose --help

The command will create an error if there is no SSL support.

Next, check that the MySQL server has SSL enabled. The below output means that the server supports SSL but it is not enabled. Enabling it can be done at the command line or in the configuration file, which will be detailed later.

$ mysql -u username -p -e"show variables like 'have_ssl'"
Enter password:
+---------------+----------+
| Variable_name | Value    |
+---------------+----------+
| have_ssl      | DISABLED |
+---------------+----------+

Documentation on setting up certificates:
5.5.7.4. Setting Up SSL Certificates for MySQL

First, generate the CA key and CA certificate:

$ openssl genrsa 2048 > mysql-ca-key.pem
Generating RSA private key, 2048 bit long modulus
............................................+++
............+++

$ openssl req -new -x509 -nodes -days 356 -key mysql-ca-key.pem > mysql-ca-cert.pem
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:US
State or Province Name (full name) [Berkshire]:California
Locality Name (eg, city) [Newbury]:Burbank
Organization Name (eg, company) [My Company Ltd]:Acme Road Runner Traps
Organizational Unit Name (eg, section) []:Acme IRT
Common Name (eg, your name or your server's hostname) []:mysql.acme.com
Email Address []:acme-irt@acme.com

Create the server certificate:

$ openssl req -newkey rsa:2048 -days 365 -nodes -keyout mysql-server-key.pem >
mysql-server-req.pem

Generating a 2048 bit RSA private key
.............................+++
.............................................................+++
writing new private key to 'mysql-server-key.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:US
State or Province Name (full name) [Berkshire]:California
Locality Name (eg, city) [Newbury]:Burbank
Organization Name (eg, company) [My Company Ltd]:Acme Road Runner Traps
Organizational Unit Name (eg, section) []:Acme IRT
Common Name (eg, your name or your server's hostname) []:mysql.acme.com
Email Address []:acme-irt@acme.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:

$ openssl x509 -req -in mysql-server-req.pem -days 356 -CA mysql-ca-cert.pem -CAkey mysql-ca-key.pem
-set_serial 01 > mysql-server-cert.pem
Signature ok
subject=/C=US/ST=California/L=Burbank/O=Acme Road Runner Traps/OU=Acme IRT/CN=
mysql.acme.com/emailAddress=acme-irt@acme.com
Getting CA Private Key

Finally, create the client certificate:

$ openssl req -newkey rsa:2048 -days 356 -nodes -keyout mysql-client-key.pem >
mysql-client-req.pem
Generating a 2048 bit RSA private key
................+++
.................+++
writing new private key to 'mysql-client-key.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [GB]:US
State or Province Name (full name) [Berkshire]:California
Locality Name (eg, city) [Newbury]:Burbank
Organization Name (eg, company) [My Company Ltd]:Acme Road Runner Traps
Organizational Unit Name (eg, section) []:Acme IRT
Common Name (eg, your name or your server's hostname) []:mysql.acme.com
Email Address []:acme-irt@acme.com

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:

$ openssl x509 -req -in mysql-client-req.pem -days 356 -CA mysql-ca-cert.pem
-CAkey mysql-ca-key.pem -set_serial 01 > mysql-client-cert.pem
Signature ok
subject=/C=US/ST=California/L=Burbank/O=Acme Road Runner Traps/OU=Acme
IRT/CN=mysql.acme.com/emailAddress=acme-irt@acme.com
Getting CA Private Key

[nr@mysqld mysqlcerts]$ ls
mysql-ca-cert.pem      mysql-client-key.pem   mysql-server-key.pem
mysql-ca-key.pem       mysql-client-req.pem   mysql-server-req.pem
mysql-client-cert.pem  mysql-server-cert.pem

To enable SSL when starting mysqld, the following should be in /etc/my.cnf under the [mysqld] section. For this example, I put the files in /etc/mysql/openssl:

ssl-ca="/etc/mysql/openssl/mysql-ca-cert.pem"
ssl-cert="/etc/mysql/openssl/mysql-server-cert.pem"
ssl-key="/etc/mysql/openssl/mysql-server-key.pem"

To use any client, for instance mysql from the command line or the GUI MySQL Administrator, copy the client cert and key to a dedicated folder on the local box along with ca-cert. You will have to configure the client to use the client certificate, client key, and CA certificate.

To connect with the mysql client using SSL, copy the client certificates to a folder, for instance /etc/mysql, then under the [client] section in /etc/my.cnf:

ssl-ca="/etc/mysql/openssl/mysql-ca-cert.pem"
ssl-cert="/etc/mysql/openssl/mysql-client-cert.pem"
ssl-key="/etc/mysql/openssl/mysql-client-key.pem"

In MySQL Administrator, the following is an example you would put into the Advanced Parameters section if you want to connect using SSL.

SSL_CA U:/keys/mysql-ca-cert.pem
SSL_CERT U:/keys/mysql-client-cert.pem
SSL_KEY U:/keys/mysql-client-key.pem
USE_SSL Yes

Replication

Before configuring replication, I made sure to review the MySQL replication documentation.

16.1.1.1. Creating a User for Replication
Because MySQL stores the replication user’s name and password using plain text in the master.info file, it’s recommended to create a dedicated user that only has the REPLICATION SLAVE privilege. The replication user needs to be created on the master so the slaves can connect with that user.

GRANT REPLICATION SLAVE ON *.* TO 'repl'@'192.168.1.50' IDENTIFIED BY ‘password’;

16.1.1.2. Setting the Replication Master Configuration
Edit my.cnf to uncomment the “log-bin” line. Also uncomment “server-id = 1”. The server-id can be anything between 1 and 2^32 but must be unique.

Also add “expire_logs_days” to my.cnf. If you don’t, the binary logs could fill up the disk partition because they are not deleted by default!

expire_log_days = 4

16.1.1.3. Setting the Replication Slave Configuration
Set server-id to something different from the master in my.cnf. Although not required, enabling binary logging on the slave is also recommended for backups, crash recovery, and in case the slave will also be a master to other systems.

16.1.1.4. Obtaining the Master Replication Information
I flush the tables to disk and lock them to temporarily prevent changes.

# mysql -u root -p -A dbname
mysql> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.00 sec)

If the slave already has data from master, then you may want to copy over data manually to simplify things, 16.1.1.6. Creating a Data Snapshot Using Raw Data Files. However, you can also use mysqldump, as shown in Section 16.1.1.5, “Creating a Data Snapshot Using mysqldump”.

Once the data is copied over to the slave, I get the current log position.

Mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000002 | 16524487 |              |                  |
+------------------+----------+--------------+------------------+ 
1 row in set (0.00 sec)

mysql> UNLOCK TABLES;

16.1.1.10. Setting the Master Configuration on the Slave
Finally, configure the slave. The log file and log position tell the slave where to begin replication. All changes after that log position will be replicated to the slave.

mysql> CHANGE MASTER TO
->     MASTER_HOST=’192.168.1.50’,
->     MASTER_USER=’repl’,
->     MASTER_PASSWORD='replication_password',
->     MASTER_LOG_FILE=’ mysql-bin.000002’,
->     MASTER_LOG_POS=16524487,
->     MASTER_SSL=1,
->     MASTER_SSL_CA = '/etc/mysql/openssl/mysql-ca-cert.pem',
->     MASTER_SSL_CAPATH='/etc/mysql/openssl/',
->     MASTER_SSL_CERT = '/etc/mysql/openssl/mysql-server-cert.pem',
->     MASTER_SSL_KEY = '/etc/mysql/openssl/mysql-server-key.pem';
Query OK, 0 rows affected (0.74 sec)  

mysql> START SLAVE;

Replication can start!
The slave status can be checked via the following command:

mysql> show slave status;

06 September, 2009

Two years

It has been two years since I started this blog. Here is a quick recap of notable posts that consistently get a substantial number of page views.

IR/NSM:

System Administration:

15 July, 2009

Building an IR Team: Documentation

My third post on building an Incident Response (IR) team covers documentation. The first post was Building an IR Team: People, followed by Building an IR Team: Organization.

Good documentation promotes good communication and effective analysts. Documentation is not sexy, and can even be downright annoying to create and maintain, but it is absolutely crucial. Making it as painless and useful as possible will be a huge benefit to the IR team.

Since documentation and communication are so intertwined, I had planned on making one post to cover both topics. However, the amount of material I have for documentation made me decide to do a future post, Building an IR Team: Communication, and concentrate on keeping this post to a more digestible size.

There are quite a few different areas where a Computer Incident Response Team (CIRT) will need good documentation.

Incident Tracking
Since I am writing about computer IR teams, it is obvious that the teams will be dealing with digital security incidents. For an enterprise, you will almost certainly need a database back-end for your incidents. Even smaller environments may find it best to use a database to track incidents.You will need some sort of incident tracking system for many reasons, including but not necessarily limited to the following.

Tracking of incident status and primary responder(s)
Incident details
Response details and summary
Trending, statistics and other analysis

Tracking the status and who is responsible for specific incidents is one of the primary reasons for incident tracking. Some off-the-shelf software can support incident tracking, for instance help desk ticketing software or other tasking software. This type of software will certainly support the basic needs like status (assigned, in progress, open, closed, etc) and who the incident is assigned to.

However, off-the-shelf software may not have great support for the incident details. A great example is IP addresses and ports. Logging IP addresses, names of systems, ports if applicable, and what type of vulnerability was exploited can be extremely useful for trending, statistics, and historical analysis. A field for IP addresses can probably be more easily be queried than a full text field that contains IP addresses. If I see that a particular IP address successfully attacked two systems in the previous shift, or a particular type of exploit was used successfully on two systems, I want to be able to quickly check and see how many times it happened in the past week. I also want to be able to pull that data out and use it to query my NSM data to see if there was similar activity that garnered no response from analysts.

Reponse details can be thought of as a log that is updated throughout the incident, from discovery to resolution. Having the details to look back on is extremely useful. You can use the details for a technical write-up, an executive summary, to recreate incidents in a lab environment, for training, lessons learned, and more. My general thought process is that the longer it takes to document an incident, the more likely the documentation is to be useful.

Trending and statistical analysis can be used to help guide future response and look back at previous activity for anything that was missed, as I already mentioned. It is also extremely useful for reports to management that can also be used to help gain political capital within the organization. What do I mean by political capital?

Say you have noticed anecdotally that you are getting owned by web servers over HTTP, but the malicious sites are usually known to be malicious, for instance when searching Google or using a anti-malware toolbar. Your company has no web proxy and you recommend one with the understanding that most of the malicious sites would be blocked by the web proxy. The problem is that the networking group does not want to re-engineer or reconfigure, and upper management does not think it is worth the money. With a thorough report and analysis using the information from incident tracking, and by using that data to show the advantages of the proxy solution, you could provide the CIRT or SOC management the political capital they need to get things moving when faced with other parts of the company that resist.

Standard Operating Procedures (SOP)
Although analysts performing IR need to be able to adapt and the tasks can be fluid, a SOP is still important for a CIRT. A SOP can cover a lot of material, including IR procedures, notification and contact information, escalation procedures, job functions, hours of operation, and more. A good SOP might even include the CIRT mission statement and other background to help everyone understand the underlying purpose and mission of the group.

The main goal of a SOP should be to document and detail all the standard or repetitive procedures, and it can even provide guidance on what to do if presented with a situation that is not covered in the SOP. As an example, a few bullet points of sections that might be needed in a SOP are:

Managing routine malware incidents
Analyzing short term trends
Researching new exploits and malicious activity
Overview of security functions and tools, e.g. NSM
More detailed explanation and basic usage information for important tools, e.g. how to connect to Sguil and who are the administrators of the system

Although a SOP will not cover every situation, the goal should be to make the team more efficient and provide a reference for tasks or procedures that are used repeatedly. I'm not a fan of hand-holding and like analysts to try and figure things out on their own, so I don't mind if analysts use different methods as long as the end results are consistent in both accuracy and format.

I also like analysts to think about the most efficient way to analyze an incident. Some may gather information and investigate using slightly different methodology, but each analyst should understand that something simple should be checked before something that takes a lot of time, particularly when the value of the information returned will be roughly equal. The analysis should use what my boss likes to call the "Does it make sense?" test. Gathering some of the simplest and most straightforward information first will usually point you in the right direction, and a SOP can help show how to do this.

Knowledge Base
A knowledge base can take many different forms and contains different types of information than SOP, though there also may be overlap. There are specific knowledge base applications, wikis, simple log applications, and even ticketing or tasking systems that provide some functionality for an integrated knowledge base. A knowledge base will often contain technical information, technical references, HOWTOs, white papers, troubleshooting tips, and various other types of notes and information.

One of my favorite options for a knowledge base is a wiki. You can see various open knowledge bases that are using wikis, for instance NSMWiki and Emerging Threats Documentation Wiki, but if you want organization- and job-specific knowledge bases then you will also need something to hold the information for your CIRT.

The reason I pick those two wikis as examples is because they contain some of the exact type of information that is useful in a knowledge base for your CIRT. The main difference is that your knowledge will be specific to your organization. One good example are wiki entries for specific IDS rules as they pertain to your network, in other words an internal version of the Emerging Threats rule wiki. There may be shortcuts to take with regard to investigating specific rules or other network activity to quickly determine the nature of the traffic, and a wiki is a good place to keep that information.

Similarly, documentation on setting up a NSM device, tuning, or maintenance can be very effectively stored and edited on a wiki. The ease of collaboration with a wiki helps keep the documentation useful and up to date. If properly organized, someone could easily find information needed to keep the team running smoothly. Some example of documentation I have found useful when put it in a wiki:

How to troubleshoot common problems on a NSM sensor
How to build and configure a NSM sensor
How to update and tune IDS rules
List and overview of scripts available to assist incident response
Overviews of each available IR tool
More detailed descriptions and usage examples of IR tools
Example IR walk-throughs using previously resolved incidents
Links to external resources, e.g. blogs, wikis, manuals, and vendor sites

One of the best ways I can think of to effectively communicate the usefulness of both a knowledge base and a SOP to senior technical personnel is by pointing out that better documentation makes it less likely that you are needed for help. Additionally, it is much faster to resolve an issue for another analyst if you can do something like refer the analyst to step-by-step instructions in the knowledge base. If you want fewer calls during off hours, make sure the analysts have the documentation they need.

Shift Logs
In an environment with multiple shifts, it is important to keep shift logs of notable activity, incidents, and any other information that needs to be passed to other shifts. Although I will also discuss this in Building an IR Team: Communication, the usefulness of connecting the shifts with a dedicated log is apparent. Given the amount of email and incident tickets that are generated in an environment that requires 24x7 monitoring, having a shift log to quickly summarize important and ongoing event helps separate the wheat from the chaff.

Since my feeling is that shift logs should be terse and quick to parse, what to use for logging may not be crucial. The first examples that come to my mind are software designed for shift logs, forum software, or blogging software. The main features needed are individual accounts to show who is posting, timestamps, and an effective search feature. Anything else is a bonus, though it may depend on exactly what you want analysts logging and what is being used to handle incident tracking.

One thing that is quite useful with the shift log is a summary post at the end of each shift, and then the analysts should verbally go over the summary at the shift change. This can help make sure the most significant entries are not missed and it gives the chance for the oncoming shift to ask questions before the outgoing shift leaves for the day.

As usual, I can't cover everything on the topic, but my goal is to provide a reference and get the gears turning. The need for good documentation exists and documentation is important to use to the IR team's advantage.

07 July, 2009

The "Does it make sense?" test

I was composing the next installment of my series on building an incident response team and started to include this, but then decided it deserves a separate entry.

Some time ago, my boss came up with what he calls the "Does it make sense?" test as a cheat-sheet for help training new analysts and to use as a quick reference. When we refer to traffic making sense, we are asking whether the traffic is normal for the network.

This is very simple and covers some of the quickest ways an analyst can investigate a possible incident. Consider it a way to triage possible NSM activity or incidents. Using something like this can easily eliminate a lot of unnecessary and time-consuming analysis, or point out when the extra analysis is needed.

The "does it make sense" test:

Determine the direction of the network traffic.
Determine the IP addresses involved.
Determine the locations of the systems (e.g. internal, external, VPN, whois, GeoIP).
Determine the functions of the systems involved (e.g. web server, mail server, workstation).
Determine protocols involved and whether they are "normal" protocols and ports that should be seen between the systems.
When applicable, look at the packet capture and compare it to the signature/rule.
Use historical queries on NSM systems and searches of documentation to determine past events that may be related to the current one.

Based on the above knowledge, does the traffic that caused the alert make sense or is it abnormal? Simple examples:

A file server sending huge amounts of SMTP traffic over port 25 probably does not make sense, whether because of malicious activity or a misconfiguration.
Someone connecting to a workstation on port 21 with FTP probably does not make sense.
A DNS server sending and receiving traffic to another DNS server over port 53 does make sense. However, an analysis of the alert and the DNS traffic may still be needed to verify whether the traffic is malicious or not.

Remember, traffic that makes sense and is normal on one network may not be normal on another network. Having a good baseline of your network traffic is extremely important before you can accurately determine what traffic makes sense and what traffic does not makes sense. Even traffic that does not make sense is not automatically malicious.

Also remember, traffic that makes sense is not always friendly. A good attacker will make his network traffic look like it fits in with the the baseline traffic, making the traffic less likely to stick out.

30 June, 2009

Exploiting the brain

In some interesting science news, talking into a person's right ear is apparently a good idea if you want the person to be receptive to what you're saying.

If you want to get someone to do something, ask them in their right ear, say scientists.
Italian researchers found people were better at processing information when requests were made on that side in three separate tests.
They believe this is because the left side of the brain, which is known to be better at processing requests, deals with information from the right ear.

The article also states that what is heard through the right ear gets sent to "a slightly more amenable part of the brain." Even when you know about something like this, it is probably difficult or even impossible to consciously over-ride the differences between hearing something in the right ear versus the left ear.

Looking at this through a security mindset, a threat could be someone that knows how to exploit this behavior, and to reduce the vulnerability could require actively training yourself to overcome standard neuroscience. This type of knowledge can even be applied directly to something like a penetration test that includes social engineering in the rules of engagement.

Next time you ask for a raise, make sure you're to the right of your boss.

25 June, 2009

Building an IR Team: Organization

This is my second post in a planned series. The first is called Building an IR Team: People.

How to organize an Computer Incident Response Team (CIRT) is a difficult and complex topic. Although there may be best practices or sensible guidelines, a lot will be dictated by the size of your team, the type and size of network environment, management, company policies and the abilities of analysts. I also believe that network security monitoring (NSM) and incident response (IR) are so intertwined that you really should talk about them and organize them together.

A few questions that come to mind when thinking of organization and hierarchy of the team:

Will you only be doing IR, or will you be responsible for additional security operations and security engineering?
What is the minimal amount of staffing you need to cover your hours of operation? What other coverage requirements do you have dictated by management, policies, or plain common sense?
How will the size of your team effect your hierarchy and organization?
Since being understaffed is the norm, how can you organize to improve efficiency without hurting the quality of work?
Can you train individuals or groups so you have redundancy in key job functions?
Referencing both physical and logical organization of the team, will they be centralized or distributed?
What is your budget? (Richard Bejtlich has had a number of posts about how much to spend on digital security, including one recently).

IR and other Security Operations
The first question really needs to be answered before you start answering all the rest. There are two basic models I have seen when organizing a response team. The simpler model is to have a response team that only performs incident response, often along with NSM or working directly with the NSM team. Even if the response team does not do the actual first tier NSM, the NSM team usually will function as a lower tier that escalates possible incidents to the IR team.

The more complex, but possibly more common, model is to have incident responders and NSM teams that also perform a number of other duties. I mentioned both security operations and security engineering in the bullet point. Examples of security operations and engineering could be penetration testing, vulnerability assessment, malware analysis, NSM sensor deployment, NSM sensor tuning, firewall change reviews or management, and more. The reason I say this model may be more common is the bottom line, money. It is also difficult to discretely define all these job duties without any overlap.

There are advantages and disadvantages to each model. For dedicated incident responders, advantages compared to the alternative include:

Specialization can promote higher levels of expertise.
Duties, obligations, procedures and priorities are clearer.
Documentation can probably be simplified.
IR may be more effective overall.

Disadvantages can include:

Money. If incident responders perform a narrow set of duties, you will probably need more total personnel to complete the same tasks.
Less flexibility with personnel.
Limiting duties exclusively to incident response may result in more burn-out. Although not a given, many people like the variety that comes with a wider range of duties.

Advantages of having incident responders also perform other security operations and engineering:

Money.
A better understanding of incident response can produce better engineering. A great example is tuning NSM sensors, where an engineer that does the tuning has a much better understanding of feedback and even sees the good and bad firsthand if the same person is also doing NSM or IR.
Similarly, other projects can promote a better understanding of the network, systems and security operations that may promote more efficient and accurate IR.

Disadvantages:

Conflicting priorities between IR and other projects.
More complex operating procedures.
Burn-out due to workload. (Yes, I listed burn-out as a disadvantage of both models).
Less specialization in IR will probably reduce effectiveness.

Staffing
Before deciding on the number of analysts you need for NSM and IR, you have to come to a decision on what hours you will maintain. This question is probably easier for smaller operations that don't have as much flexibility. If there is no budget for anything other than normal business hours, it is definitely easier to staff IR and security operations in general. Once you get to an enterprise or other organization that maintains some 24x7 presence, it starts getting stickier.

If you will have more than one shift, you will obviously have to decide the hours for each shift. It is important to build a slight overlap into the shifts so information can be passed from the shift that is ending to the shift that is starting. Both verbal and written communication, namely some kind of shift log, is important so any ongoing incidents, trends or other significant activity are not dropped. I will get into more detail when I write a future post, tentatively titled Building an IR Team: Communication and Documentation.

Organizing so each shift has the right people is significant. Obviously, the third shift will generally be seen as less desirable. Usually someone that is willing to work the third shift is trying to get into the digital security field, already has a day job, or is going to school. It is fine line between finding someone that will do a good job on the third shift but not immediately start looking for another job that has better hours, so you have to get a clear understanding of why people want to work the third shift and how long you expect them to stay on that shift. It can help to leave opportunities for third shift analysts to move to another shift since that can allow enough flexibility to keep the stand-outs rather than losing them to another job with more desirable hours.

I am not a big fan of rotating shifts. Though a lot of places seem to implement shifts by having everyone eventually rotate through each shift, I think it does not promote stability or employee satisfaction as much as each person having a dedicated shift.

Staffing can also be influenced by policy or outside factors. Businesses, government and military all will have certain information security requirements that must be met, and some of those requirements may influence your staffing levels or hours of operation.

Hierarchy
If you only have one or two analysts, you probably won't need to put much thought into your hierarchy. If you have a 24x7 operation with a number of analysts, you definitely need some sort of defined hierarchy and escalation procedures to define NSM and IR duties. Going back to the section on other security operations, you may also need to define how other duties fit into the hierarchy, procedures and priorities for analysts that handle NSM, IR, and/or additional duties.

At left is an example of an organizational chart when the IR Team also has other duties and operates in a 24x7 environment. In addition to rotating through NSM and IR duties, each analyst is a member of a team. This is just an example to show the thought process on hierarchy. There are certainly other operational security needs that I mentioned, may merit a dedicated team, but are not included in my example, for instance forensics or vulnerability assessment.

Each team has a senior analyst as the lead, and the senior analysts can also double as IR leads. It is crucial that every shift have a lead to define a hierarchy and prevent any misunderstandings about the chain of command and responsibilities.

For this example, let us say that your organizational requirements state two junior analysts per shift doing NSM and IR. You could create a schedule to rotate each junior analyst through the NSM/IR schedule, which means monitoring the security systems, answering the phone, responding to emails, investigating activity, and coordinating IR for the more basic incidents. You would also probably want one senior analyst designated as the lead for the day. The senior analyst can provide quality assurance, handle anything that needs to be escalated, do more in-depth IR, and task and coordinate the junior analysts. The senior analyst can also decide that the NSM and IR workloads require temporarily pulling people off their project or team tasks to bolster NSM or IR. Finally, it may be a good idea to have the senior analyst designated as the one coordinating and communicating with management.

While the senior analysts need to excel at both the technical duties and management, the shift leads need to facilitate communication between everyone on that particular shift, management, and other shifts. Though it is helpful if the shift lead is strong in a technical sense, I do not think the shift lead necessarily has to be the strongest technical person on the shift. He or she needs to be able to handle communication, escalation, delegation, and prioritization to keep both the shift members and management happy with each other. The shift lead is basically responsible for making sure the shift is happy and making sure the CIRT is getting what it needs from the shift.

The next diagram shows a group that is dedicated only to NSM and IR. Obviously, this model is much easier to organize and manage since the tasks are much narrower. Note that, even with this model where everyone is dedicated to NSM and IR without additional duties, proper NSM and IR may call for things like malware analysis, certainly forensics for IR, or giving feedback about the security systems' effectiveness to dedicated engineers.

As one last aside regarding the different models, I have to stress that vulnerability assessment and reporting is one of the biggest time sinks I have ever seen in a security operation. If you can only separate one task away from your NSM and IR team to another team, I strongly suggest it be vulnerability assessment. There are certainly a lot of arguments about how much or how little vulnerability assessment you should be doing in any organization, but most organizations do have requirements for it. As such, it is a good idea to have a separate vulnerability assessment team whenever possible because of the number of work-hours the process requires. Note that penetration testing is clearly distinct from vulnerability assessment, and requires a whole different type of person with a different set of skills.

Redundancy
Ideally, you want to minimize what some call "knowledge hoarding" on your team. If someone is excellent at a job, you need that person to share knowledge, not squirrel it away. Some think knowledge hoarding provides job security, but a good manager will recognize that an analyst that shares knowledge is much better than one that does not. From personal experience, I can also say that mentoring, training and sharing knowledge is a great way to reduce the number of calls you get during non-working hours. If I do not want to be bothered at home, I do my best to document and share everything I know so the knowledge is easily accessible even when I am not there.

Sharing knowledge provides redundancy and flexibility. That flexibility can also spread the workload more evenly when you have some people swamped with work and others underutilized. If someone is sick or too busy for a particular task, you do not want to be stuck with no redundancy. I suppose this is true of most jobs, but it can be a huge problem in IR. As an example, if a particular person is experienced at malware analysis and has automated the process without sharing the knowledge, someone else called on to do the work in a pinch will be much less efficient and may even try to manually perform tasks that have already been automated.

Certainly most groups of incident responders will have standouts that simply can't be replaced easily, but you should do your best to make sure every job function has redundancy and that every senior analyst has what you could call at least one understudy.

Distribution of Resources
If you are in a business that has multiple locations or it is a true enterprise, one thing to consider is the physical and logical distribution of your incident response team. Being physically located in one place can be helpful to communication and working relationships. Being geographically distributed can be more conducive to work schedules if the business spans many timezones. One thing that can greatly increase morale is providing as many tools as possible to do remote IR. Sending a team to the field for IR may be needed sometimes, but reducing the burden or even allowing work from home is a sure way to make your team happier.

Regardless, an IR team needs people in the field that can assist them when needed. Depending on the technical level of those field representatives, the duties may be as simple as unplugging a network cable or as advanced as starting initial data collection with a memory and disk capture. Most IR teams will need to have a good working relationship with support and networking personnel to help facilitate the proper response procedures.

I only touched on some of the possibilities for organizing both NSM and IR teams. As with anything, thought and planning will help make the organization more successful and efficient. The key is to reach a practical equilibrium given the resources you have to work with.

09 May, 2009

Extracting emails from archived Sguil transcripts

Here is a Perl script I wrote to extract emails and attachments from archived Sguil transcripts. It's useful for grabbing suspicious attachments for analysis.

In Sguil, whenever you view a transcript it will archive the packet capture on the Sguil server. You can then easily use that packet capture to pull out data with tools like tcpxtract or tcpflow along with Perl's MIME::Parser in this case. The MIME::Parser code is modified from David Bianco's blog.

As always with Perl or other scripts, I welcome constructive feedback. The first regular expression is fairly long and may scroll off the page, so make sure you get it all if you copy it.

#!/usr/bin/perl

# by nr
#   2009-05-04
# A perl script to read tcpflow output files of SMTP traffic.
# Written to run against a pcap archived by Sguil after viewing the transcript.
#   2009-05-07
# Updated to use David Bianco's code with MIME::Parser.
# http://blog.vorant.com/2006/06/extracting-email-attachements-from.html

use strict;
use MIME::Parser;

my $fileName; # var for tcpflow output file that we need to read
my $outputDir = "/var/tmp"; # directory for email+attachments output

if (@ARGV != 1) {
    print "\nOnly one argument allowed. Usage:\n";
    die "./emailDecode.pl /path/archive/192.168.1.13\:62313_192.168.1.8\:25-6.raw\n\n";
}

$ARGV[0] =~ m
    /.+\/(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(\d{1,5})_(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}):(25)-\d{1,3}\.raw/
    or die "\nIncorrect file name format or dst port is not equal to 25. Try again.\n\n";

system("tcpflow -r $ARGV[0]"); # run tcpflow w/argument for path to sguil pcap

my $srcPort = sprintf("%05d", $2); # pad srcPort with zeros
my $dstPort = sprintf("%05d", $4); # pad dstPort with zeros

# Put the octest and ports into array to manipulate into tcpflow fileName
my @octet = split(/\./, "$1\." . "$srcPort\." . "$3\." . "$dstPort");

foreach my $octet(@octet) {
    my $octetLength = length($octet); # get string length
    if ($octetLength < 5) { # if not a port number
        $octet = sprintf("%03d", $octet); # pad with zeros
    }
    $fileName = $fileName . "$octet\."; # concatenate into fileName
}

$fileName =~  s/(.+\d{5})\.(.+\d{5})\./$1-$2/; # replace middle dot with hyphen
my $unusedFile = "$2-$1"; # this is the other tcpflow output file

# open the file and put it in array
open INFILE, "<$fileName" or die "Unable to open $fileName $!\n";
my @email = <INFILE>;
close INFILE;

my $count = 0;
# skip extra data at beginning
foreach my $email(@email) {
    if ($email =~ m/^Received:/i) {
        last;
    }
    else {
        delete @email[$count];
        $count ++;
    }
}

my $parser = new MIME::Parser;
$parser->output_under("$outputDir");
my $entity = $parser->parse_data(\@email); # parse the tcpflow data
$entity->dump_skeleton; # be verbose when dumping

unlink($fileName, $unusedFile); # delete tcpflow output files