A Distributed Search Engine for the Distributed Web - Dweb.page

in #technology5 years ago

While search neutrality might be open for discussion, it is pretty clear that Google’s centralized search engine with a market share of above 90 % and quarterly earnings of above 30 billion dollars are far from ideal. Monopolies not only are economically inefficient but also increase the chance of censorship and search bias

If it comes to finding information on the distributed web, a centralized search engine seems counterintuitive, because it goes against the principles underlying the distributed web. That is why we are working hard to create the first fully functional, completely distributed search engine for our project Dweb.page

Problem

Despite the earlier mentioned downsides of current search engines, we believe multiple reasons have led to difficulties in changing the existing model. At the same time, a distributed and fully transparent search engine for the Dweb comes with a set of challenges:

  1. Speed: The speed of the distributed search engine needs to be at least as high as the current solutions, and there are a lot of problems with the transaction times based on distributed ledgers.
  2. Device independence: Today more and more people are using mobile phones; the distributed search engine needs to run on PCs and mobile phones without any centralized backend.
  3. Indexing: How to collect, parse, and store data to facilitate fast and accurate information retrieval in a distributed way and still ensure that people don’t create fake search entries?
  4. Availability: How to ensure that distributed data is still available when requested? Especially since the data can be hosted locally and therefore only be available in certain time slots.
  5. Monetization and incentives: How to finance the storage and continuous development of the tool? Without this monetization part figured out, it will be difficult for decentralized solutions to compete with existing centralized ones for example regarding human talent or partnerships/integrations, etc.

A potential solution

To ensure high speed and feeless transactions, it was clear from the beginning that distributed ledger technologies which are limited by either one of the two performance issues were not an option. Therefore, we chose the combination of IPFS and IOTA. IPFS is fulfilling the obvious role of a fast and distributed way to share and host files, whereas IOTA provides the necessary distributed database layer. It is important to notice here that the database only uses a part of the IOTA technology which is already fully functional and independent of future research work (e.g., regarding the coordinator).

This combination allows us to provide an experience which works on all kinds of devices. We even had a prototype running in the Internet Explorer. The unique feature is that we can deliver a fully distributed experience without the additional installation of any software since all the code is running inside a simple, completely open source web page, which by itself is distributed on IPFS. It also means every single user will run their own search engine, which is the ultimate distribution. 

Inspired by this distributed interface, we are working on the following concept for a distributed search engine: 

We assume two types of users, who we call Authors and Consumers (one person could fulfill both roles though). Authors upload content on the distributed web via Dweb.page. If they want their content to be publicly found by others, metadata, which is signed by the Authors, is upload on IOTA. This way anyone can create their own metadata instead of a centralized indexing system. On top of it, this signature system would make it impossible to pretend to be someone else, which today happens for example with news stories or bank websites. When Consumers open for the first time Dweb.page, they will start loading the most recent metadata in the background. Based on this metadata a search engine running locally provides the user with initial and fully transparent search results. These first searches by the user will automatically be used to subscribe to the potentially interesting Authors and this way load additional metadata. This can be seen as a social network for metadata, where Consumers “follow” the Authors. Advantages of the approach include on the one hand that users do not have to load the complete metadata of the entire web and on the other hand, that they can easily block a provider of malicious metadata (e.g., wrongly labeled content). Furthermore, without this subscriber/block model, people could start spamming the search engine.

Additionally, everyone who uses the search engine of Dweb.page will generate information about the availability of content. This means that if someone tries to download some content on the distributed web which is no longer available, the information will be passed to other users. If multiple Authors tell you that a file is no longer available, it’s automatically removed from your search results. If only one tells you about it, the file would still be listed in your search results to give you the option to check, if the Author doesn’t try to prevent you from accessing certain content by lying about its availability.

The last key, challenging, and often overlooked part for every distributed project is how to monetize and provide incentives to storage providers and developers of the distributed web. In a distributed and open source solution without any centralization, it is possible to circumvent any incentive model. That is why a lot of decentralized projects end up having a centralized layer. Furthermore, donation-based systems don’t seem to work well for subscription or long-term based business models. That is why we are considering creating a model benefiting all participants while maintaining complete transparency. The following picture illustrates how this potential solution would work: 

 The search market is well positioned for advertisement since, even without giving up any privacy, it is possible to show advertisement based on search terms. This advertisement revenue can then be split and be used on the one hand to provide a certain amount of free storage to Authors and, on the other hand, to support the Developer to improve the tool further. If you think for example about Google providing you 15 GB of free cloud storage and still earning billions quarterly, you get the idea that the above model might result in a completely free web for Authors! Also, it is important to point out that, a large share of the population is not against advertisement per se, but against the misuse of their personal data, which would be impossible based on this model.

Naturally, this model needs to be set up completely transparent on a distributed ledger. If this is the case, a normal contract between all participants might be sufficient at the beginning since you could easily sue the malicious parties (e.g., if money gets misused instead of invested into the infrastructure). However, this contract should also contain right from the beginning the option to change over time based on a voting system for example. Otherwise, a model like this would be unable to adapt to future developments, for example, storage prices might get so cheap that it makes sense to use the money for other purposes. This and other aspects of the system, like the quality of the provided storage or advertisement, might be difficult to integrate into smart contracts. Nevertheless, at a later stage, this setup should be replaced with fully automated smart contracts.

This article provides a view of our current research, and it does not constitute a finished product (visit Dweb.page for the current prototype). We believe that we can only achieve this vision if we are transparent right from the start and we appreciate any feedback or contribution. Help us in achieving this vision:

  I initially posted this article on medium.  

Sort:  

Congratulations @noc2! You have completed the following achievement on the Steem blockchain and have been rewarded with new badge(s) :

You published your First Post
You got a First Vote

Click here to view your Board
If you no longer want to receive notifications, reply to this comment with the word STOP

Support SteemitBoard's project! Vote for its witness and get one more award!

Coin Marketplace

STEEM 0.30
TRX 0.11
JST 0.033
BTC 64104.40
ETH 3148.52
USDT 1.00
SBD 4.25