MattockFS; Computer-Forensics File-System : Part One
This post is the first of an eight-part series regarding the MattockFS Computer-Forensics File-System. This series of post will be based on the MattockFS workshop that I gave at the Digital Forensics Research Workshop three months ago in Überlingen Germany.
So about MattockFS. Last year I finished my M.Sc Forensic Computing and Cybercrime Investigations at the University Colege Dublin with a research project that, next to my minor thesis, resulted in a proof of concept implementation of a Computer Forensics File-System that I currently am working on, and is really close to becoming a production ripe product. A product, that while designed primarily to accommodate important needs of computer forensic frameworks, could very well be useful as a storage system and local message bus for other fields with high data integrity requirements or page-cache efficiency are an important concern. So what is MattockFS ? MattockFS is an ongoing project implementing a Secure data-archive & local message bus for asynchronous (computer-forensic) frameworks. In this first of eight posts on MattockFS, we are going to look at the asynchronous processing and the toolchain approach.
We will look at two base models for computing and some of the issues we can run into when using these models together. We have different asynchronous models of computation, most notably the actor model of computation. And then there is the toolchain approach. An approach where a unit of data is processed by a chain of tools appropriate for its content.
Let's start by looking at models of concurrency and what their strong and weak parts are.
There are two main approaches to concurrent processing. There is shared state concurrency where multiple concurrent entities access the same mutable shared state, and then there is another model of concurrency, message passing concurrency where state is passed between concurrent entities by means of messages. Shared state concurrency uses a common namespace between concurrent entities and then uses locks and semaphores in order to coordinate access to shared mutable resources. In contrast, message passing concurrency uses a private address space for each concurrent entity, using queues to temporarily store messages going between concurrent units of computation. Both models have their own issues. The main issues with shared state concurrency are robustness and liveliness concerns. Message passing concurrency also isn't without its own issues. With differences in processing speeds, it is possible that issues can arise with queue sizes and latency.
Let's explore a few different message passing based asynchronous models of computation. The most fundamental model is probably the actor model. A simple model that defines actors as universal fundamental primitives
of concurrent digital computation.
Other ways to look at asynchonous processing with message passing are models that use workers and models that use producers and consumers as base concepts.
Now let us look at how asynchronous message passing concurrency fits into computer forensics. The Open Computer Forensics Architecture (OCFA) was a computer forensic framework build by the Dutch National Police Force starting in 2001 that was made open source in 2006 and became an orphaned project in 2012. The framework was designed as a distributed asynchronous message passing concurrency framework for processing of large (last decade definition of large) digital forensic investigation spanning hundreds of disk images containing potential evidence. The framework was designed so it could make use of existing tools and libraries and had robustness features such as fault isolation and recoverable failure. The framework could be looked at from three different abstraction layers. One of these was that of a meta-data routed start shaped network where a router would look at the new meta data extracted by the previous tool in the toolchain and would based on that dynamically determine the next hop in the tool chain for that specific piece of data.
A second way to look at OCFA would be at the the Anycast message bus layer of abstraction. The OCFA Anycast basically was a worker geared message bus. It is easiest to look at our tools as multi-process actors in the routing perspective, where each tool process would be a worker in the Anycast message bus perspective.
We consider the OCFA Anycast message bus to be one of four important ancestors, in spirit, of the MattockFS computer forensics file-system. It was a message bus system based on the workers concept that used on disk persistent priority queues to accommodate what boils down to infinite sized persistent crash and reboot resistant queues.
Part of my research project focused on identifying and locating issues with the performance of OCFA, and part of these issues could be brought back to a conflict between the implementation of the message passing concurrency in the AnyCast and the tool-chain level of abstraction. Basically, the design with infinite queue sizes turned out to conflict with actor model theory where latency along critical paths is identified as being critical to performance. This design choice was aggravated by the fact that high latencies resulted in inevitable page cache misses between tools needing to access the same data consecutively.
OCFA did a whole lot of things right when it comes to large scale forensic data processing, when however it came to page-cache efficiency, the OCFA design was such that page-cache efficiency degraded as OCFA would run, getting less efficient with time.
In upcomming post I will discuss how MattockFS is designed to help mitigate these issues in future computer forensic and possibly other asynchonous frameworks that could bennefit from being built on top of MattockFS.
In my next post however I shall discuss the need for Integrity, privilege separation and capabilities in a multi process message passing concurency based computer forensic framework, and will visit two other ancestors of MattockFS that help make MattockFS an amazing tool in ensuring data integrity in future computer forensic frameworks.
Continue to the next post in this series.
This is very cool and very informative!