MattockFS; Computer-Forensics File-System : Part Seven

pibara (60)in #forensics • 7 years ago (edited)

This post is the 7th of an eight-part series regarding the MattockFS Computer-Forensics File-System. This series of post is based on the MattockFS workshop that I gave at the Digital Forensics Research Workshop in Überlingen Germany earlier this year.

If you missed any of the previous installments, they are available here:

In the previous installment of this series, we did a hands-on walkthrough using a virtual machine with MattockFS. Today we continue where we left off in the previous installment, so if you didn't already, you may want to go through the Hands-On part with respect to installing and injecting the E01 file into the MattockFS archive and message bus.

Where the last time we discussed the file-system itself as a form of API, this API is quite inconvenient to use from a programming language, and language bindings are thus essential for actually accessing the system from code. I've written language bindings for Python so far and I've made a bit of a start on C++ language bindings.

Please note that neither the file-system as an API nor the language bindings are meant to be used by actual framework modules as MattockFS by itself is not a computer forensics framework. More on that in the last installment of this series where we shall be discussing how MattockFS is to be used as a component for building such a framework, and we shall be shortly touching on the Mediocre Forensic Module Framework that is meant as a relatively naive example framework that people may use as basis for creating a full fledged module framework that runs on top of MattockFS.

So let us continue with the Python part of the MattockFS hands on. We will get to writing actual scripts later on. First, we shall be working with Python interactively. The Python API provides two pieces of functionality. It provides language bindings for the MattockFS file-system as-an-API interfaces but next to that it also implements functionality for working with CarvPath designations.

Now, change directory to the MattockFS source tree and start-up the python interpreter. When the Python interpreter is running, import MountPoint from mattock.api and instantiate a MountPoint for the primary archive mount-point.

Remember the carvpath entities from last time. The Python API provides a convenient operator overloaded API for those. Run the above commands and then compare them to the bash magic we had to do to do the same with a shell command last time.

In order to work with carvpath designation on a lower abstraction level, something that will often be needed, we need to import Fragment and possibly also Sparse from mattock.carvpath. Run the above commands ans see what happens.

If needed we can query the individual fragments and sparse entities in a designated entity.

It is possible that a raw carvpath designation becomes too long to be handled by a user space file-system due to size limitations for directory and file names of maximum path lengths. In such cases, a hash is calculated over the carvpath token and that hash is used instead of the full carvpath. The carvpath itself is stored in a (potentially distributed) Redis key/value Redis entry for a digest desinating carvpath.

Remember how last time we kickstarten a disk image and then faked being a data storage module. Also remember how we played around with mmls and other sleuthkit tools from outside of a worker context? Well, the actual data of the E01 we kickstarted is still waiting for us to start up an mmls module. So now it's time for us to fake being such a module.

The above shows us how a module working directly on our low-level API could register as an mmls module. We will come back to this later., but first we need to discuss child jobs.

The above is an example of a piece of code that derives children from a parent entity using carvpath fragments only. You can run the above code from your open Python interpreter, but try to understand what is going on with every line you type.

Notice that we created some trivial mmls output parsing in the imported mmls from the dfrwsdemo python module. For now just assume this functionality exists. We are trying to walk through the MattockFS python API, so the mmls output parsing is a bit out of scope for discussing here.

In our previous example, all partitions from mmls were assumed to contain walkable file-systems and were all routed indiscriminately to the fswalk module. In many circumstances, however, there will be some smart rooting needed to determine where the child evidence should go. In the above example, only JPG files are routed to another module over the MattockFS message bus. Other child entities are simply discarded and are never even designated as children within MattockFS.

A normal module will only ever derive child entities from a parent entity it was sent to process. All entities are created or derived within the context of a job that was sent to the worker by some other actor. There is, however, one exception to this where a job needs to be created out of thin air and that is the place where data is entered into the archive by a digital forensic investigator. The register method of our MountPoint takes an optional second argument that allows us to set the job selection policy. The special "K" value tells the API that this module is a kickstart type module that expects to create its parent jobs out of thin air. This may seem a bit silly, but this API design allows for a simple and coherent API between all modules.

So far we have only worked with CarvPath designations. A normal kickstart module, however, or a module that for example extracts data from a zip file, or a module that extracts some structured metadata from a file, will need to add some new data to the archive within the context of its current job.

The code above should look a bit familliar, at least in its flow. We allocate storage and get back a path to a mutable data file. We can open the allocated data chunk as a file, but we need to open it as w+. The file can than be overwitten upto its allocated size. Once we are done with the file, we freeze it into the archive and get back a new carvpath designation to the frozen data chunk that we may use like any other carvpath designation to imutable data.

Here is a simple example of a module that adds new data to the archive. Have a look at the code and try to understand everything that is going on.

Here is an example of a naive low-level API mmls module.

And one that implements the trivial JPEG only file-system tree walking module.

Today we discussed the Python language bindings for MattockFS. If you prefer an other programming language, please consider implementing your own language and sharing your code on github.

In the last installment of this series, I will try to outline how MattockFS and the language bindings described today fit into a broader concept of a distributed scalable computer forensic framework, that hopefully others (maybe you) will feel inclined to (help) build.

#tutorial #technology #security #dfrws