Tuesday, April 2, 2013

Things You Need to Build Your Own STAP with mongoTap as an Example

by John Haldeman, Security Practice Lead

Late last year Joe DiPietro and his colleagues at IBM wrote two articles providing a detailed explanation of Guardium's Universal Feed (a link to those articles are found later in this post). The Universal Feed allows you to build your own STAPs for data sources that Guardium does not support. I recently built and open sourced a STAP for mongoDB. I want to use this post to talk about some things that you will need in order to build a custom STAP (other than the knowledge of the protocol explained in the Universal Feed articles). I will use the mongoTap as an example for the discussion.



There are really three things you will need to build your own STAP:
  1. A source of audit data to log
  2. Something to receive this audit data and translate it into Guardium's universal feed protocol
  3. A Guardium Collector to record the data 

 

The Audit Data Source

One of the most difficult things to find is the first item: the source of the audit data to log. In fact, finding a good source for the audit trail is probably the biggest obstacle to building a custom STAP. Traditional Guardium STAPs tap into the client/server communication channels to find this data using operating systems APIs. The mongoTap uses a mongoDB utility called the mongosniff to get it's audit trail information. mongosniff uses some of the same operating system APIs that traditional STAPs use. Calling mongosniff is the job of the mongoTap client which provides a stream of data to the mongoTap server.

 

Translation

The second item is the core of the custom STAP. It takes the audit data you receive from the audit data source you identified and translates it into the Universal Feed protocol that Guardium understands. Details of the Universal Feed protocol can be found in the Universal Feed articles mentioned earlier which can be found here and here. They outline everything necessary for you to be able to format data in a way that Guardium can understand it.

All of this translation takes a non-trivial amount of processing power to perform. There are connections to be handled, sessions to be tracked, users to correlate (in mongoDB's case), and data to be parsed and sent to the collector. Because of this, the choice was made to split out this processing into a separate component called the mongoTap server that could reside somewhere other than where mongoDB was installed. Similarly, Guardium's traditional STAPs leave their parsing and translation to processes that reside on the collectors.

The mongoTap server uses regular expressions, the ruby-protobuf ruby library, and the bindata ruby library to help with the translation (one thing to note that isn't explicitly mentioned in the universal feed articles is that the bit order should be big endian for your wrapper messages - that took some debugging to figure out). As for protocol buffers, if you want to build a type 1 STAP and use something other than Java, you'll likely want to start out with a .proto file. Unfortunately it isn't provided to you directly in the sample code. You can, however, derive the file from the compiled Java protocol buffer code. It's a little painstaking to do but you'll have a head start as this was done for the mongoTap. A partial reconstruction of Guardium's datasource proto file can be found here (enough was reconstructed to implement everything currently supported by the mongoTap).

The mongoTap server uses another Ruby library called EventMachine to handle it's communication with the mongoTap client. EventMachine has it's critics (and fans), but I found it to be invaluable in handling the stream of data coming from the mongoTap client and mongosniff. There was a lot less code to write because of it and no threads to maintain myself.

 

Collection

The final item needed is the Guardium collector. It handles all the logging. One particularly nice feature is that the collector does all of the session tracking for you if you provide a session locator consisting of a client IP, client port, database IP, and database port. I found we still needed some memory and tracking of the sessions for the mongoTap, but not having to acquire and maintain session IDs coming back to you from the collector is a big help and a smart design on part of the protocol.


That's about it for the items you need to build your own STAP. The big lesson learned doing this is that the Universal Feed saves you a lot of time, but it still requires a lot of effort to build your own STAPs. That and you need to have a source of audit data. Those can be difficult to come by and you cannot always rely on it being as easy as it is for mongoDB whichcomes with the handy mongosniff utility.

No comments:

Post a Comment