Subchains and Multi-Chain Matrices for Massive Blockchain Data PropagationsteemCreated with Sketch.

in #blockchain8 years ago (edited)

blockchain-ledger6169d.jpg

Last week I again thought of the Steemit plan of taking over the world with a blockchain. There is a problem I keep coming back to since I made a post about this "evil plan" related to the UBI idea Dan proposed.

The Problem

There is a problem of the size of one singular database being updated throughout a network of distributed nodes. 1) Imagine if the world was simply on one singular blockchain, how unmanageable that would be? 1 million terrabytes, or more? That is not a feasible future.

Also, 2) text alone adds up to make a large database to deal with over time, but when we start adding images, and video data hosted in a distrusted manner, it will take more space.

Lastly, 3) think of the old-data "dust-bin" factor of information. We see it in how Steemit is structured for payouts, the old content disappears from visibility to be lost for google searches to find. This applies to data in general so it's no surprise Steemit has the same function. Old data is hardly used, it doesn't need to be in a blockchain, updated and propagated all the time.

Idea

It should be obvious we need to have many blockchains (not just one), and subchains, linked in a network, and networks of networks, where certain nodes host certain types of content or data. This way, whatever is popular would be hosted by more nodes as it requires more frequent transactional upkeep to fulfill the request of users for that type of data. In addition, each type of blockchain would have an area border network of nodes that act as a link between different blockchain areas, like OSPF works in networking to add a link between each and making the sharing of data more manageable.

For example, banking or money transactions would have a certain frequency and number of nodes for uptime and reliability. Maybe this would be the blockchain with the most nodes on the planet, with subchains as well for certain sectors.

Another sector of data for a blockchain would be images. Even different categories of images could have their own blockchain, because not all content is as valued. Funny pictures might be more popular than photography of events in the news, or art, or vice versa.

Video, and it's size, could be hosted on nodes, and have it's quantity of nodes and it's quality and reliability based on it's own blockchain sustainability. Again, there can be different blockchains for different types of videos too.

For Steemit, as an early example of applying these subchains, is to apply it to the most popular categories that take up the most data storage. For example, when/if images are hosted on the blockchain, this will greatly add to the load. This would need to be done on a separate blockchain to fetch data when required. The nodes for the text based blockchain would be manageable, and the image blockchain demands would increase first and could be managed separately.

This can not only apply to specific media type, like text, images, video, audio, but also to specific content type like blogs, photography, art, sports, news, etc. These would also be managed in terms of popularity to host a certain number of nodes for reliability and quality. This way, if some category is less popular, then less nodes are requires to host it, and more nodes can be used to host other types of content that has a more heavy used of traffic or transactions.

This way, people who host nodes, can choose to host nodes of data they want to choose, that they want to take pride in maintaining uptime, and reliable data delivery. If I would be hosting blockchains, and I was the upkeeper to ensure data integrity, I would want to do it for somethingI cared about, more than I would want to do it for something I didn't care about or detested. This greater freedom will have blockchain node hosts care even more about their hardware and upkeep.

So I left this alone last week and went back to my site, other projects, and posts to do. Until yesterday when @dantheman made his post, and @l0ki this morning. I wanted to chime in on what I had come up with as well in support of these new posts. L0k1 has added an insight to my original idea I came up with, that he also came up with. I will give credit where it is duly deserved :) The idea of subchains, though used in another manner for another reason, was introduced in steemspeak weeks earlier. I came up with this use and reason through my own thinking.

@dantheman and @l0k1's input

A better way of managing the data can be developed. I have come up with some ideas, and Dan yesterday made a post about some future announcement Steemit will make on helping increase the transaction amount on the Steem blockchain:

"This week our team has come up with a design and roadmap for scaling Steem to the speeds required to handle as many users and transactions as your all can throw at us. We are in the process of documenting our design and producing a roadmap for its development and deployment."

And this morning, l0k1 made a post on distributing the latency of the blockchain, which shares my idea, and adds an additional aspect I had not though of regarding popularity:

"Trusted Caching Nodes (that you are running or frequently using) keep up to date but they inform other nodes whuch nodes of the database they are interested in most frequently, as related to the queries they get. ... Synchronising data to nodes clients are not requesting of makes no sense, this synchronisation should prioritise traffic to fit utilisation."

l0k1 is right on target, it's pretty much what I was thinking, except he used more technical terminology. Please go read what he wrote as well. I will develop or add to l0k1's point.

Multi-Chain Matrix Blockchain Future

We are talking about a lot of data in the world. Many different types of blockchains, with subchains and more subchains, will be required to properly fragment and distribute the data in a manageable way. Interconnecting the most frequent and popular data is important.

The technology of blockchains can be further developed to incorporate what I decided to all multi-chain matrices. Instead of hosting a node for one chain that is large, you could host fragment of subchains that are most popular and have the most frequent transactions to improve the reliability of those popular blockchains. This is similar to what @l0k1 is saying about "cached nodes".

As he says, it's demand based. Which is what I had thought of when I was saying to make the blockchains separated based on content types and media type and having a quantity of nodes for each based on their popularity or frequency of use. Node hosts can choose what data they want to host and maintain reliability.

We can host nodes which have the most demand and value to certain parts of the Steemit community, or eventually, to the whole internet community at large. And we can also choose to host nodes that we value the most, even if others don't. We can act as guardians, upholders and upkeepers of knowledge. I see this as the future of the blockchain technology. I just need to learn to code it and develop it... hahah!

What do you think?


Thank you for your time and attention! I appreciate the knowledge reaching more people. Take care. Peace.

Payout Selected


If you appreciate and value the content, please consider:
Upvoting upvote91a69.png ,    Sharing share2195b.png and   Reblogging reblog33b5f.png below.

Follow me for more content to come!


@krnel
2016-12-01, 10:30am

Sort:  

It is very critical to separate read demand from writers. Steem can easily scale to unlimited read requests. It is the business logic, not storage that must be scaled.

You are right about needing a large number interconnected chains. We can divide steem into about 10 chains such that exchanges don't need to load all content and posts.

That alone only gives us at most 10x and that assumes equal load balance, but of course that isn't the case.

We need to enable chains to apply all transactions in a block in parallel. Next we need to pipeline evaluation like CPUs do.

Thanks for the clarification.

Steem may be able to handle unlimited read requests, but the nodes that host the data and have to deliver those requests, is the issue when you are talking about huge data files like movies, or the size of the blockchain itself. I still see that as an issue, when talking about nodes hosting all the data on one chain. A million terrabyte ledger is not viable. That proves how size is an issue, and why subchains are required. You break it up into different databases, which is how the whole internet is structures, networks of networks. I was just making the case that a blockchain can't stay one blockchain for all data, as it's not possible in the long term.

Large datasets are best stored in a DHT. The business logic is hard to compress or divide once you get down to sequence dependent operations.

A large ledger can be distributed via DHT, but the interpretation of the ledger requires the active state.

The structure you describe for a block chain database records is different than how we organize it.

Alright, I'm more talking about the node itself that has to download the blockchain ledger. It's not manageable as one single blockchain ledger when it gets too big. That was my main issue. Thanks for the feedback and clarity.

Better than manual splitting would be automatic. The demand driven model he refers to is from me, and it is about using frequent associations to find data that should be pooled so it can be added to without waiting for the data to come from other nodes, in the case of nodes not caching all the data for the purpose of increasing the number of domains where transaction authorisations can be immediately certified to their provenance.

To implement it I suggest something like a hierarchy of nodes like we now have Witnesses, and runners up. Another hierarchy related to capacity, both storage and processing. These nodes don't keep the whole blockchain, but a related subset, and clustered according to transaction history and frequency that leaves from them are added on, such as user accounts, tokens, and contracts.

Some have to be more broadly spread, but if these subnodes are sufficient in number, they in effect break the blockchain in a more temporary and specific way, driven by use.

Tracking client request frequency, correlating other entities that associate with them, is not for increasing read speed, but decreasing latency by having the necessary data already in cache. I am pretty sure to some degree Graphene already does some of this within the node's live buffer caches, I am mainly talking about expanding the caching to various kinds of caching nodes who specialise in aggregating associated data on disk, instead of having to store it all, only a little, and other nodes know to propagate transactions to them.

I think we are talking much the same thing with breaking into subchains, but My idea is derived from the very solutiin Witnesses and Masternodes enable - reducing the cost of convergence by delaying the replication so the data is canonical within a cluster of nodes overlapping in their focal areas, and thus confirmed quickly. In the background the node propagates first to near neighbours, and much later than currently, the network converges. But where it is used, it is nearly instant.

Well, I am just trying to help here anyway. Parallelisation is key here so knowledge from routing and caching systems is very pertinent, as is graph analysis to find divisible regions. From what I understand, Graphene is very adapted to graph manipulation. Alsi his reminds me about how 3D graphics systems extensively work with graphs and GPUs have special processing systems for dealing with them (matrix transforms).

A blockchain weave network! I like the sound of that. As long as we can remain trustless and accountably adaptable! :)

Yes I like it. It Kind of reminds me of where the NXT/Ardor Team is going with a few twists

So glad to have bright, caring minds like you looking into things and steering the Steemit ship!

@kus-knee (The Old Dog)

Hehe, thank you for the kind words :)

Totally good idea. I had it open but didn't read yet. Good stuff. I would change a few things, but a great idea to start with.

Hit me with you ideas! I'd love to hear them. It's clearly in infancy.

Steem_Land Steem_Land tweeted @ 01 Dec 2016 - 16:40 UTC

Subchains & Multi-Chain Matrices for Massive Blockchain Data Propagation

steemit.com/blockchain/@kr… / https://t.co/fSmfkM1gdP

@steemiobot @Beyond_Bitcoin

Steem_Land Steem_Land tweeted @ 01 Dec 2016 - 16:39 UTC

Subchains & Multi-Chain Matrices for Massive Blockchain Data Propagation

steemit.com/blockchain/@kr… / https://t.co/fSmfkM1gdP

@SteemUps @SteemitPosts @steemit

Disclaimer: I am just a bot trying to be helpful.

This post has been linked to from another place on Steem.

Learn more about and upvote to support linkback bot v0.5. Flag this comment if you don't want the bot to continue posting linkbacks for your posts.

Built by @ontofractal

Thanks for the reference :) Yes, I would also add to your analysis the existing and in-development protocols for different and larger types of data such as media, software and such :

Bitmessage has a primitive protocol for expiring old data based on time to live, after which point the nodes can remove old messages for space for new ones.

Maidsafe is a protocol loosely based on bittorrent but with a scheme for charging rent on the space. I have an old post with old ideas related to this in my "Agora" network system.

My idea is more from a perspective of converging the protocols and linking them together, as a fluid, seamless whole that is regulated by utilisation. In my conception, there is a need for nodes to have a challenge protocol to ensure data stores indeed have the data (replay attack resistant) and an insurance system that penalises contract breakers according to frequency and size of failures. It also relates to an in-band currency blockchain for enabling pay-on-the-line access points, as well as programmable to enable location obfuscation.

In particular, the insurance protocol is something both very necessary and quite disruptive as it develops. In the near future, records of reliability like credit records will disincentivise poor service provision, but once there is insurance and reporting systems, next you can move to human interacring actuarial systems as well, and the insurance busts out of the network and flattens insurance companies.

I mention this because as you think through this fluid, demand driven converging network, you realise that making and breaking network contracts with human factors will eventually become a problem. And it is only a small step to income insurance systems (aka UBI, the new fad word for this), and we go to another level. But blockchain systems would give the lowest possible administrative cost, while massively diminishing potential for fraud.

Coin Marketplace

STEEM 0.19
TRX 0.16
JST 0.033
BTC 64010.98
ETH 2791.72
USDT 1.00
SBD 2.65