Programming useful data structures: Binary Decision Diagrams

in #tauchain5 years ago (edited)

The Binary Decision Diagram is the state of the art data structure which Ohad Asor chose for TML. I briefly mentioned BDDs in the past but now that TML is more fleshed out on Github I will discuss it again because it's a very useful data structure for certain purposes which include model checking. A reminder, TML and Tauchain specifically must made extensive use of model checking and BDD is specifically a data structure useful for implementing a state of the art symbolic model checker.

In this blog post I will take a look at some of the code behind BDDs.

What is model checking

Model checking is a method of doing what computer scientists call "formal verification". The process of formal verification can be implemented as a "computed aided verification" process. In Cybersecurity we learn that one of the mechanisms for software security is the paradigm of "correctness". The correct by construction methodology is a means to achieving this guaranteed correctness.

Model checking can be more precisely defined:

  • A model is a description of a system (hardware/software). We must distinguish that if it is an existing system we can call it a model but if it is a new system then we call it a "specification". Models can be formal (described in mathematically precise languages) or informal expressed in ambiguous

Why is model checking so important

Correctness assures that the behavior of an algorithm matches the behavior described in it's specification. For Tau we need to know that software will terminate, and that the algorithms will behave as expected. Unexpected behaviors can result in security vulnerabilities and if the software doesn't terminate then we have the halting problem. To avoid the halting problem, TML will be relying on decidable logic in certain instances.

Preliminary Summary

  • Model checking is the mechanism behind formal verification. To understand verification we simply must ask if the given system satisfies the given specification.
  • If the answer to the above question is yes then we can go beyond verification to synthesis which is to construct based on the given specification.
  • Tauchain must implement both formal verification via model checking and go beyond this to eventually implement synthesis which is construction. We need to know that the software will halt, so we must rely on decidability. We desire to have correctness so that we can construct industrial grade software (or hardware) via Tauchain.

What is a BDD?

To understand what a BDD is we first have to think of Boolean functions. Why? Because BDDs are a method of symbolically representing and manipulating Boolean functions. BDDs represent Boolean functions by way of Directed Acyclic Graph (DAG). DAGs are probably familiar to most blockchan developers so I will not go into it too much other than to say that the DAG in this instance is used to create a sort of compressed decision tree. A decision tree is a sort of information map, so these are essentially data structures.

To understand Boolean functions just remember the truth table. Remember that we want to get a result of either true or false for every question we ask in Tau. We want to be ably to apply the deductive reasoning with the power of partial fixed point logic.

Python EDA allows us to experiment with the BDD data structure. As we can see, the Boolean Functions section presents T and F as 1 and 0. For more see: https://pyeda.readthedocs.io/en/latest/boolalg.html#boolean-functions

We can also see Constructing BDDs where it says first you must create your Boolean expression. A Boolean expression is described here: https://pyeda.readthedocs.io/en/latest/expr.html

A Boolean expression is a logical expression which must evaluate to either TRUE or FALSE. So think of YES or NO, ON or OFF, A OR B.

Looking at the code example we see:

f = expr("a & b | a & c | b & c")
This would mean A and B or A and C or B and C.
The Boolean expression is mapped onto the variable f.

f = expr2bdd(f)

This sends the Boolean expression contained in variable f to the BDD via expression2bdd.
In other words it's a converter of Boolean expressions to the BDD data structure.

Note this is a Python example not a Tauchain example. Tau being written initially in C++ makes it not quite as simply to walk through as there is a bit more code but let's have a look at what exists on Github:

In Ohad's code we can see from the opening you can see what he is working with:

#include <set>
#include <map>
#include <unordered_set>
#include <unordered_map>
#include <vector>
#include <cstdint>
#include <string>
#include <cstring>
#include <iostream>
#include <random>
#include <sstream>
#include <climits>
#include <stdexcept>
#include <cassert>
using namespace std;

He's not using a BDD library written by someone else but is using his own custom implementation of BDD. We can immediately see he is relying on tuples. Tuples are finite ordered lists of elements. He is also applying vectors, unordered sets, and unordered maps. Because his code is manually implemented BDD, there are some benefits and some challenges. The benefits are that there can be much less code which means at least in terms of audits it should be easier for more people to review his code. On the other hand this also means there is room for bugs which means because it's manually implemented it's going to be harder to write the code (and take longer to debug the code too). On the bright side he's coding in what looks to me to be a conservative manner, using standard libraries, and I can't see any obvious bad habits, but let me be clear I'm not the most experienced C++ coder in the world.

For the most part his code is commented. For example:

typedef array<size_t, 3> node; //bdd node is a triple: varid, 1-node-id, 0-node-id

Here it seems he is describing the array for the data type node. Typedef in C++ describes an alias for a datatype. Node is a new data type to allow for BDD. If we look at size_t, it's an unsigned integer to represent the size of the object. In this case it's 3, and in his comment he explains that the BDD node is a triple which means it's made up of varid, 1-node-id, and 0-node-id.

Now if we remember truth tables are made up of 1s and 0s, and that the Python example discussing Boolean functions mentions 1s and 0s, we can have some idea of what Ohad's code is attempting here. To understand we can think about this in terms of Prolog where we can first recall the relational structure example from Ohad's Github.

Example of Relational structure:
uncle(jim, joe)
uncle(joe, jill)
begins_with_j(joe)
begins_with_j(jim)
begins_with_j(jill)

The above shows you what TML works like. For those familiar with Prolog or Datalog we have logical facts and rules. The important concepts to understand Datalog are:

  • Predicates/relations.
  • Terms, constants, variables.
  • Goal
  • Clause, fact, rule
  • Substitution
  • Unification

So for example rules and facts are clauses. But these are facts:

uncle(jim, joe)
uncle(joe, jill)

This establishes the fact that jim and joe are uncles. It establishes the fact that joe and jill are uncles.

A more relevant example could be:

steem-account-owners(dana_edwards, personz).

The above simply states that myself and perzons are owners of steem accounts. It's logically no different from the previous example but made more relevant for illustration purposes.

Datalog and Tau are languages of logic. This means a program in Tau is called a "logic program."
In a logic program you supply the facts. You supply the rules.

Rules for example:

fam(X,Y) :- uncle(X,Y).
fam(X,Z) :- uncle(X,Y), fam(Y,Z).

The :- symbol means "implies" or "implication". The X, Y, Z are variables which act as placeholders.

Another example:

mortal(X) :- human(X).

This would say to be human is to be mortal. All humans are mortal becomes the rule. We could say all uncles are family is also the rule.

Now if we combine rules and facts?

uncle(jim, joe)
uncle(joe, jill)

What we get is the basis for a logic program. We can see the example logic programs from Ohad here.

Now we can get back to the C++...

There is more code:

typedef vector<bool> bools;
typedef vector<bools> vbools;
class bdds;
typedef tuple<const bdds*, size_t, size_t> memo;
typedef tuple<const bdds*, const bool*, size_t, size_t> exmemo;

Without going into too much detail, we know what typedef is from the previous examples. He's applying vector to establish bools and vbools. This part isn't so clear to me but if I had to guess it's bools and vector bools. In this case he is probably going with vector logic or boolean vectors to create the equivalent of truth tables. We must also remember that a relation is an unordered set of tuples.

The class bdds is established. In C++ a class is a user defined or programmer defined data type which holds it's own data members and functions. Again I wont go too much into this as anyone who knows C++, or Java, or object oriented programming, is probably well aware of this.

Root, leafs, permutations...

A vertex in a BDD represents a Boolean function. These are designated as root vertices.
OBDDs are called ordered binary decision diagrams. These are where the roots and leaves come in. It has to do with variable ordering. We can also see Ohad has implemented his interpretation of the apply algorithm which has Apply_OR Apply_AND.

Leafs are nodes, and usually there are at most only two (in the case of reduced order BDDs) represented as 1 or 0.

https://github.com/IDNI/tau/blob/master/tml.cpp

In summary without going too deep into Ohad's code, I will say we can clearly see he has implemented BDD in C++. He has his own custom version of BDD so we will have to discuss with him exactly how it is intended to work, but some of the code is commented so anyone can read it. The details on exactly how his implementation works is not for me to try to describe other than to say from how it appears it is a functional implementation.

I encourage those who have more programming experience than me to examine the Tau Github source code. I also encouraging testing to reveal how well it works (or how buggy it is). https://github.com/IDNI/tau

References

Chaki, S., & Gurfinkel, A. (2018). BDD-based symbolic model checking. In Handbook of Model Checking (pp. 219-245). Springer, Cham.

Coin Marketplace

STEEM 0.29
TRX 0.12
JST 0.033
BTC 62934.09
ETH 3118.65
USDT 1.00
SBD 3.85