Blockhain

in #blokchain6 years ago

Blockchain-based Machine Learning Marketplaces
Machine learning models trained on data from blockchain-based marketplaces have the potential to create the world’s most powerful artificial intelligences. They combine two potent primitives: private machine learning, which allows for training to be done on sensitive private data without revealing it, and blockchain-based incentives, which allow these systems to attract the best data and models to make them smarter. The result is open marketplaces where anyone can sell their data and keep their data private, while developers can use incentives to attract the best data for their algorithms to them.
Constructing these systems is challenging and the requisite building blocks are still being created, but simple initial versions look like they are starting to become possible. I believe these marketplaces will transition us out of the current era of Web 2.0 data monopolies into a Web 3.0 era of open competition for data and algorithms, where both are directly monetized.
Origin
The base of this idea came in 2015 from talking with Richard of Numerai. Numerai is a hedge fund that sends encrypted market data to any data scientist who wants to compete to model the stock market. Numerai combines the best model submissions into a “metamodel”, trades that metamodel, and pays data scientists whose models perform well.
Having data scientists compete seemed like a powerful idea. So it got me thinking: can you create a fully decentralized version of this system that could be generalized to any problem? I believe the answer is yes.
Construction
As an example, let’s try creating a fully decentralized system for trading cryptocurrencies on decentralized exchanges. This is one of many potential constructions:
Data Data providers stake data and make it available to modelers.
Model building Modelers choose what data to use and create models. Training is done using a secure computation method which allows models to be trained without revealing the underlying data. Models are staked as well.

Metamodel building A metamodel is created based on an algorithm that takes into account the staking of each model.
Creating a metamodel is optional — you can imagine models that are used without being combined into a metamodel.
Using the metamodel A smart contract takes the metamodel and trades programmatically through decentralized exchange mechanisms on-chain.
Distributing gains/losses After some time period passes, trading produces a profit or loss. This profit or loss is divided up amongst contributors to the metamodel based on how much smarter they made it. Models which contributed negatively have some or all of their staked funds taken. Models then turn around and perform similar distributions/stake slashing to their data providers.
Verifiable computation Computation for each step is either performed centralized but verifiable and challengeable using a verification game like Truebit or decentralized using secure multiparty computation.
Hosting Data and models are either hosted on IPFS or with nodes in a secure multiparty computation network, as on-chain storage would be too expensive.
What makes this system powerful?
Incentives to attract the best data globally Incentives to attract data are the most potent part of the system as data tends to be the limiting factor for most machine learning. In the same way Bitcoin created an emergent system with the most compute power in the world through open incentives, a properly engineered incentive structure for data would cause the best data in the world for your application to come to you. And it’s nearly impossible to shut down a system where data is coming from thousands or millions of sources.
Competition between algorithms Creates open competition between models/algorithms in places where it previously didn’t exist. Picture a decentralized Facebook with thousands of competing newsfeed algorithms.
Transparency in rewards Data and model providers can see they are getting the fair value of what they’ve submitted since all computation is verifiable, making them far more likely to participate.
Automation Taking action on-chain and generating value directly in tokens creates an automated and trustless closed loop.
Network effects Multi-sided network effects from users, data providers, and data scientists make the system self-reinforcing. The better it performs, the more capital it attracts, which means more potential payouts, which attracts more data providers and data scientists, who make the system smarter, which in turn attracts more capital, and back around again.
Privacy
In addition to the points above, a major feature is privacy. It allows 1) people to submit data that otherwise would be too private to share and 2) prevents the economic value of the data and models from leaking. If left unencrypted in the open, the data and models will be copied for free and used by others who have not contributed any work (the “free rider” problem).
A partial solution to the free rider problem is to privately sell data. Even if buyers choose to resell or release the data, its value decays with time. However, this approach restricts us to short duration use cases and still creates typical privacy concerns. As a result, the more complicated but powerful approach is to use a form of secure computation.
Secure computation
Secure computation methods allow models to train on data without revealing the data itself. There are 3 main forms of secure computation being used and researched today: homomorphic encryption (HE), secure multi-party computation (MPC), and zero knowledge proofs (ZKPs). Multiparty computation is most commonly used for private machine learning at the moment, as homomorphic encryption tends to be too slow and it’s not obvious how to apply ZKPs to machine learning. Secure computation methods are on the bleeding edge of computer science research. They are often orders of magnitude slower than regular computation and represent the main bottleneck to the system, but have been improving in recent years.
The Ultimate Recommender System
To illustrate the potential of private machine learning, imagine an app called “The Ultimate Recommender System”. It watches everything you do on your devices: your browsing history, everything you do in your apps, the pictures on your phone, location data, spending history, wearable sensors, text messages, cameras in your home, the camera on your future AR glasses. It then gives you recommendations: the next web site you should visit, article to read, song to listen to, or product to buy.
This recommender system would be extremely potent. More than any of the existing data silos of Google, Facebook, or others could ever be because it has a maximally longitudinal view of you and it can learn from data that otherwise would be too private to consider sharing. Similar to the prior cryptocurrency trading system example, it would work by allowing a marketplace of models focused on different areas (ex: web site recommendations, music) to compete for access to your encrypted data and recommend things to you, and perhaps even pay you for contributing your data or your attention to the recommendations generated.
Google’s federated learning and Apple’s differential privacy are one step in this private machine learning direction, but still require trust, don’t allow users to directly examine their security, and keep data siloed.
Current approaches
It’s very early. Few groups have anything working and most are trying to bite off one piece at a time.
A simple construction from Algorithmia Research places a bounty on a model that is accurate above a certain backtesting threshold:

Simple construction creating a bounty on a machine learning model by Algorithmia Research
Numerai currently takes things three steps further: it uses encrypted data (although not fully homomorphically), it combines crowdsourced models into a metamodel, and it rewards models based on future performance (in this case, one week of stock trading) rather than backtesting through a native Ethereum token called Numeraire. Data scientists must stake Numeraire as skin in the game, incentivizing performance on what will happen (future performance), not what has happened (backtested performance). However, it currently centrally distributes data, limiting what feels like the most important ingredient.
No one has created a successful blockchain-based marketplace for data yet. The Ocean is an early attempt to outline one.
Still others are starting by building secure compute networks. Openmined is creating a multiparty compute network for training machine learning models on top of Unity that can run on any device, including game consoles (similar to Folding at Home), then expanding to secure MPC. Enigma has a similar tact.

Coin Marketplace

STEEM 0.17
TRX 0.16
JST 0.031
BTC 60334.00
ETH 2571.82
USDT 1.00
SBD 2.56