In fact, you can use artificial intelligence's reinforcement learning technology to determine the parameters of a new pass or an existing encryption economy. I don't know if other people have considered this issue, I hope to discuss a better way to determine the encryption economic parameters of the blockchain project.
For blockchain projects, cryptographic economic parameters, ie hyperparameters in the AI world, are important because they have a major impact on future Token distribution.
However, most of today's projects seem to be free to choose. For example, an ICO project may decide to retain 10%, or 20%, or 50%, or even 90% of the token, while contributing the rest to the “Development Fund”, getting the X% founder reward, and so on.
Similarly, when Casper FFG goes online at Ethereum (Ethereum Lift Program (EIP) 1011), it is also considering reducing the 3ETH block reward to 0.8 ETH. In most projects, the creation team or community wants to optimize the “ideal” long-term distribution of the pass (eg, user/developer, distribution of most passes between speculators or miners), but few projects have Try to strictly quantify this distribution.
In this paper, we attempt to describe how to use reinforcement learning to achieve a high-level vision of this goal, with the aim of discussing the “hyperparametric” optimization in the “encrypted digital currency | blockchain project”.
Cryptography is similar to a reinforcement learning (RL) problem
Reinforcement Learning (RL) is a sub-domain of artificial intelligence. We define agents rather than using training and test data to teach and evaluate models. Agents can take various actions over time in a specific environment. Each action produces an agent-specific reward (or penalty) and is quantified using a reward function. Agents must optimize their rewards over time and learn the “best” actions to maximize their long-term rewards in the environment.
This framework seems to be related to several encrypted numbers or blockchain projects, because many projects have a limited set of agents (eg, users, developers, miners, speculators, hackers, etc.) that can interact with them. Take a limited set of operations and get rewards from their actions (in the form of tokens or project values).
Why choose RL?
I think encryption algorithms are especially good for reinforcement learning - building an environment model may be simpler than several other reinforcement learning cases.
All interactions with the blockchain are done online, although some offline operations may be integrated into the model (for example, collusion between verifiers in a dPOS system), but these are an exception, not a rule.
In addition, reinforcement learning has proven to be very successful in finding errors in its operating environment ( 2 ). Most blockchain designs are fairly simple, at least compared to other types of environments, such as video games or "real world." This indicates that RL may effectively find problems with blockchain system design.
Finally, one of the challenges of intensive learning is to define clear rewards.
Fortunately, most blockchain projects have a token associated with them that has a certain value. Therefore, for multiple agents, the reward function can be accumulated based on tokens with some warnings.
Edit: @cpfiffer pointed out in the comments:
Another point of view for RL is that we care about the state balance (at time T) and our relationship with all intervention steps (times t0, t1, ... T), you can make a very powerful The argument that the intervention step may describe the evolution of the system.
How to do?
The implementation details are beyond the scope of this article, but at a high level, we can imagine using the following workflow to build an RL simulation:
- Define a set of hyper-parameters to optimize, define the proxies involved in the network, and the potential operations of each agent in the network;
- Define rewards for each agent, taking into account all their potential interactions with the network;
- Define the operating environment for the agent that models the network;
- Add agents to environment instances with specific hyperparameters (eg tokens, txn costs, mining rewards, etc.) to maximize their rewards over N unit time periods and output the final state of the network. That is, how many tokens each agent has and the amount of value they contribute or receive from the network;
- Rerun (4) with a different set of hyperparameters, but should test multiparameter hyperparameter values;
- Perform an analysis of the results and select the hyperparameters that result in the "best" final state of the network.
Problems and warnings
Although RL may be the most promising method for determining the best hyperparameters for a network, it is not perfect. Here are some of the issues that may arise:
- High computational cost of simulation
- Establish a high “entry threshold” for the model (in other words, you need to understand RL / AI and “blockchain”)
- Reward around the challenges of defining the environment and mimicking real-life behaviors
- RL is non-deterministic, so two runs of the same simulation may produce different results.
For a lucid reading of the current status of RL, please refer to https://www.alexirpan.com/2018/02/14/rl-hard.html 9
Reinforcement learning can be used to better determine the hyperparameters of the blockchain network.
I have outlined a high-level approach to how to do this, and some of its potential pitfalls.
The important thing is that reinforcement learning is just a tool to help us achieve optimal hyperparameters for these projects. Other methods may be more efficient, and the results of the RL model should be considered as "determining which hyperparameters to use" - one of many multiple inputs.