Using the STEEM blockchain for forensic data integrity

in #dfrws6 years ago (edited)

Getting ready to visit the DFRWS-EU next week in Italy. Last year I did a well-received workshop on MattockFS at this conference, and although I didn't have much time to work on MattockFS since, I did run a rather intriguing experiment that I hope to present in the lightning talk slot. An experiment combining the MattockFS forensic file-system with the STEEM blockchain.

The blockchain for forensic integrity

While MattockFS uses privilege separation and a capability-based API that could easily be augmented with Linux Mandatory Access Controls (AppArmor) in order to further increase the integrity properties of the user-space file-system implementation of this forensic data archive and message bus system, there is one integrity aspect that it doesn't account for, namely a bad-apple administrator with full access to the underlying file-system. MattockFS protects its provenance log integrity mainly through the simple mechanism of running the file-system as a different user than the modules. The same goes for the conceptual 'write once' property of the archive. While archive data added through MattockFS is write once and the underlying data files aren't directly accessible to the user id the modules run under, they are accessible by the mattockfs user or the system administrator who has full access to the system MattockFS runs on, and possibly other systems as well depending on the storage configuration. What MattockFS cannot guard against however is a bad-apple administrator who sets out to corrupt the forensic archive and accompanying logs.

While in blockchains, the storage capacity and bandwidth, and often the transaction cost, tends to be prohibitive for full-fledged provenance logging to a blockchain, the property of tamper-proof timestamped operation is a property that could be really useful for improving the integrity guard for forensic data processing frameworks and processes. In this post, I want to explore this using an experiment I've done using MattockFS and the STEEM blockchain.

MattockFS Opportunistic Hashing logs

Screenshot from 2018-03-16 09-48-41.png

MattockFS implements opportunistic hashing. Whenever new data is written to MattockFS, in-order, or whenever a designatable entity within the archive is read in-order from start to end, or if not in-order, in a way that allows the opportunistic hashing algorithms to incrementally progress its hashing in an in-order way, the hashes are logged to disk into a simple log file protected from
the module processes. The log file contains pairs of carvpath annotations of storage entities and sub-entities within the archive and BLAKE2 hashes of the data contained at that carvpath at the moment of hashing. If an attacker with admin rights were to compromise the integrity of the archive, that same attacker could, in theory, go through all the logs, recalculate the relevant opportunistic hashes, and compromise the log files as well in order to reestablish internal consistency.

15 Minute aggregation Merkle tree

As stated before, the bandwidth of the growing opportunistic hashing log file is far too high, to store all the log lines into a blockchain. We solve this problem by introducing an aggregation step to the processing of the hash stream that is being logged. The way we aggregate the log is by defining 15-minute windows, basically, to aggregate within. If at any time an opportunistic hash and carvpath pair is logged and the time since the last flush (or system start) exceeds 15 minutes, all the lines since the previous flush (or system start) are used to create a simple JSON Merkle tree representation of the 15+ minute log-lines. In our implementation, we use the BLAKE2 keyed hashing facility to create our Merkle tree.

Screenshot from 2018-03-16 09-45-34.png

Notice, the root of the Merkle tree is a single hash that will be different if any of the hashes in the aggregation would be changed. This means that if the attacker would change the data in part of the archive, the Merkle tree root, and any intermediate node hashes would need to change as well.

Merkle tree root as transaction note

So now we are down to just four hash values per hour, Les than one hundred hashed per day that could suffice for guarding the integrity of the forensic data archive. To do this, we look at the STEEM blockchain. The blockchain used by the steemit blogging platform. A platform that rewards authors and curators of blog posts with a chunk of a reward pool. The reasons to use this blockchain are multiple. The blockchain is fast, transaction costs are low or nonexistent, and like in some other crypto's we can add a memo to transactions. We should try to be good citizens of the platform and as not to abuse the facilities this amazing platform offers, we choose not to target transactions at our own account. Instead, we target our transaction at the @null account in order for our 0.001 STEEM to be burned by the platform. This means our usage of STEEM for our purposes will cost us 0.096 STEEM per day, the equivalent of about 0.16 euro per day.

Screenshot from 2018-03-16 09-56-45.png

As memo for our transaction, we use a little mattockfs prefix followed by a case identifier, followed by our Merkle tree root hash. The transaction will end up in the STEEM blockchain, and as the blockchain basically is a ledger, our Merkle tree root will get timestamped, and by the nature of blockchain technology, this timestamped Merkle tree root will quickly be made part of the cryptographically secured blockchain in a totally tamper-proof way.

Proof of concept code

The proof of concept code has been made part of the MattockFS and the mfmf repositories. Note that part of the code, the code that posts to STEEM and records the blockchain info, isn't close to production stable due mostly to some library issues.

  • The Merkle tree creation code has been integrated into MattockFS itself. MattockFS writes Merkle tree JSON to a log file.
  • A little timer tick process has been added to the MattockFS repo.
  • A Merkle tree MattockFS process has been added to the MattockFS repo. This process basically does a MattockFS kickstart for each Merkle tree JSON and forwards it to a STEEM posting mfmf module.
  • A STEEM posting mfmf module has been added to the mfmf repository.

Note that the final item on this list, the STEEM posting mfmf module is currently a bit flaky. It works as a proof of concept, but due to issues with libraries used and the temporary need to use two distinct libraries and in fact two distinct versions of Python, there are currently some rather serious stability issues keeping the code base from being production stable.

Caveat regarding STEEM and cost of operation

While the cost of operation might be low, available blockchain bandwidth fluctuates. Depending on an account's stake In the platform (expressed in STEEM Power, the bandwidth allocated at certain times might be insufficient for even four transactions per hour. It is thus essential when using the STEEM blockchain that the account used has a sufficiently large stake. So while operation cost might be low, there is a certain speculative risk involved when buying a stake in the platform, needed for a guarantee of sufficient blockchain bandwidth.

Conclusions

While this was just a simple experiment with MattockFS, an experiment that I do aim to follow up on in MattockFS, I believe it clearly highlights the strength of the used approach. That if we apply Merkle trees for aggregation and scalability, blockchain technology can readily be applied to safeguarding forensic integrity. STEEM seems to be particularly up to the task. It is a blazingly fast blockchain with a very affordable cost of operation for our scenario.

Sort:  

How can I track content prior to edition of a post?
Searched for a relevant place to ask about it.
steemd.com and steemblockexplorer.com seem to have failed me at it.
2018-10-6

It's a bit fiddly.

  1. Get the "created" and "last_updated" fields from the post.
  2. Do a binary search to find the blocks matching these timestamps.
  3. Fetch every block in the range between the two identified blocks and look for updates with matching permalink and author.

Hope this helps.

Does it mean that it got cryptic and obscure enough that it is no longer accessible through the blockchain explorers (the platform itself never had the ability to find versions of the content prior to the last)?

  1. Do a binary search to find the locks matching these timestamps.

You meant blocks instead of locks?
It took me more than one read in order to guess.

Yes, blocks, not locks, sorry. As far as I know the only way to get at all the the older versions is this fiddly way. But then I only use the condenser API, maybe one of the other API's has ways to do it more directly.

This comment has received a 43.48 % upvote from @steemdiffuser thanks to: @stimialiti.

Bids above 0.05 SBD may get additional upvotes from our trail members.

Get Upvotes, Join Our Trail, or Delegate Some SP

This comment has received a 49.50 % upvote from @steemdiffuser thanks to: @stimialiti.

Bids above 0.05 SBD may get additional upvotes from our trail members.

Get Upvotes, Join Our Trail, or Delegate Some SP

Thank you so much for using our service! You were protected from massive loss up to 20%

You just received 20.35% upvote from @onlyprofitbot courtesy of @stimialiti!

Want to earn more with us? Our APR can reach as high as
15% or more!

More portion of profit will be given to delegators, as the SP pool grows!

Comment below or any post with "@opb !delegate [DelegationAmount]" to find out about current APR, estimated daily earnings in SBD/STEEM

You can now also make bids by commenting "@opb !vote post [BidAmount] [SBD|STEEM]" on any post without the hassle of pasting url to memo!

* Please note you do not have to key in [] for the command to work, APR can be affected by STEEM prices

You got a 61.13% upvote from @whalecreator courtesy of @atempt! Delegate your Steem Power to earn 100% payouts.

You got a 29.64% upvote from @sleeplesswhale courtesy of @stimialiti!

This comment has received a 26.09 % upvote from @steemdiffuser thanks to: @stimialiti.

Bids above 0.05 SBD may get additional upvotes from our trail members.

Get Upvotes, Join Our Trail, or Delegate Some SP

This comment has received a 47.62 % upvote from @steemdiffuser thanks to: @atempt.

Bids above 0.05 SBD may get additional upvotes from our trail members.

Get Upvotes, Join Our Trail, or Delegate Some SP

Thank you so much for using our service! You were protected from massive loss up to 20%

You just received 27.37% upvote from @onlyprofitbot courtesy of @stimialiti!

Want to earn more with us? Our APR can reach as high as
15% or more!

More portion of profit will be given to delegators, as the SP pool grows!

Comment below or any post with "@opb !delegate [DelegationAmount]" to find out about current APR, estimated daily earnings in SBD/STEEM

You can now also make bids by commenting "@opb !vote post [BidAmount] [SBD|STEEM]" on any post without the hassle of pasting url to memo!

* Please note you do not have to key in [] for the command to work, APR can be affected by STEEM prices

You got a 41.85% upvote from @whalecreator courtesy of @stimialiti! Delegate your Steem Power to earn 100% payouts.

This post was upvoted by @interfecto0 thanks to @stimialiti

@interfecto: Selling the cheapest upvotes on Steemit for just 0.001 SBD each! Send any amount 0.001-0.1 SBD with your postlink as memo to @interfecto to buy instant upvotes!

Coin Marketplace

STEEM 0.20
TRX 0.14
JST 0.029
BTC 66902.20
ETH 3248.49
USDT 1.00
SBD 2.64