Designing Data Intensive Applications Chapter 2 Summary

uioporqwerty (25)in #books • 5 years ago

Chapter 2: Data Models and Query Languages

This chapter basically talks about data stores and representation of data on the application layer and the data layer. It gives an overview of current data store options particularly related to relational databases and non-relational databases. In essence, it describes why there is a shift towards using NoSQL databases such as document databases and graph databases and the pros/cons of these non-relational stores and relational stores.

NoSQL

The book says that NoSQL databases are on the rise because of the following reasons:

The need for greater scalability.
Preference for FOSS over commercial products.
Specialized operations not supported by relational databases.
Frustration with the restrictiveness of relational schemas.

But NoSQL has its downsides and upsides.

Pros:

One-to-many relationships are very easy to represent.
There is less of an impedence mismatch between your application layer models and your data models which often results in simpler application level data modeling.
Fields can be added on the fly and adjust more easily than relational databases; for example, updating the schema for a relational database is more challenging than a document based database.
Lack of joins means that the data can scale more easily.
Better data locality. This means that since most of the data you need is on a single data vs spread out over various tables. It also means that you benefit from some performance improvements since everything is stored in one place. However, this is also tricky because there are limitations on document size for most NoSQL databases.

Cons:

The schema is only implicitly implied. Your app will assume a certain schema, but the database will do nothing to enforce that schema as is the case for relational databases. The book refers to this as document databases being schema-on-read vs. relational databases being schema-on-write. It is also similar to how you have compiled (relational) languages vs. interpreted (document/non-relational) languages.
Support for joins is often very weak. You need the app code to actually make an additional request and perform joins in the app code in most cases. This results in more complicated application layer code and often less performant than if you did the same joins on the database side.

For both NoSQL and SQL databases, you can still create many-to-one and many-to-many relationships, but one will just use document references (NoSQL) and another will use foreign keys (SQL).

Which Should I Choose?

The book makes the following points:

If the data in your application has a document-like structure [...], then it's probably a good idea to use a document model. The relational technique of shredding - splitting a document-like structure into multiple tables - can lead to cumbersome schemas and unnecessarily complicated application code.

If your application does use many-to-many relationships, the document model becomes less appealing.

The statement above is due to the fact that there is poor support for joins on NoSQL stores.

Declarative vs. Imperative Languages

The book goes into detail about declarative vs imperative languages as they relate to these data stores. In general, it talks about how the declarative nature of SQL allows for under-the-hood query optimization to be performed without being specific on how things are done. There is a comparison of CSS as a declarative language makes it easy to update styles without explicitly looking at the structure of the HTML document whereas in an imperative language code such as JavaScript you would have to be very explicit on which elements you target and the code wouldn't be reactive to document changes after the code has run.

Other Data Stores

We are also presented with Graph-Like data models. Graph databases are very useful if your application has mostly many-to-many relationships. The most common examples the book presents are social graphs, web graphs, road or rail networks, etc.. You can have homogenous graphs, or heterogenous graphs where the vertices can represent different types of data; for example, Facebook could represent People, Locations, etc.. The graph model basically have nodes and edges. You can represent graphs in relational databases, but the querying is often very lengthy and awkward so it is not a good idea to do it this way.

The book goes into more detail about specific query languages for graph databases, but I'll leave that off as it is somewhat implementation specific to how you want to query graph databases.

#system-design

5 years ago in #books by uioporqwerty (25)

$0.00

1 vote

STEEM 0.19

TRX 0.16

JST 0.030

BTC 63733.51

ETH 2629.48

USDT 1.00

SBD 2.83