Scientific Forecasting of SBD Volume

in #steem10 years ago (edited)

I did a forecasting of the Steem Dollar Volume, to see what it will be in the future, and the results were shocking.

I have selected the trade volume measured in BTC of the Steem Dollar, because this is the most interesting parameter. I could have measured the price in BTC or the volume in SBD itself, but those are not interesting. The price of SBD is pegged to BTC which is pegged to 1$, so there are 2 loose variables, which are hard to factor in, when forecasting.

So I think the Volume of the SBD is more interesting, since in my theory if the SBD is traded well, it is more liquid, and the 1$ peg is easier to maintain. If the SBD is not liquid, then the buy wall will have thin depth and it could go down if a big whale were to sell his SBD. So it order to have a liquid market, with thin spread and favorable prices, we need to have big trade volume. And this is what I have forecasted.




Forecasting wih ARIMA(p,d,q)

I have already played around with quantitative analysis (quant) tools in my past articles:
https://steemit.com/bitcoin/@profitgenerator/let-s-calculate-the-probability-of-bitcoin-going-to-1000usd-and-above-pt-2
https://steemit.com/bitcoin/@profitgenerator/let-s-calculate-the-probability-of-bitcoin-going-to-1000usd-and-above-pt-2

And I have begun to love them again, so you will see many more scientific forecast articles from me. In this research, I have used ARIMA, to forecast the volume.

I have grabbed the volume of the BTC_SBD market with the Poloniex API, in 5 the minute periods. This is how the volume chart looks like since it's inception Tue Jul 19 04:30:00 2016 UTC up to yesterday:

volumechart.png

The API can be called with:

https://poloniex.com/public?command=returnChartData&currencyPair=BTC_SBD&start=1&end=9999999999&period=300

But it gives it in .JSON format, so it has to be converted into .CSV before we can use it. We did that, so we started analyzing the data.

First thing is first, we need to establish if there is a trend or not. From the chart itself by looking at it, it's obvious that there is no trend and that the dataset is stationary. But that is subjective so we need to objectively measure it.

We use an Augmented Dickey-Fuller test and a Kwiatkowski–Phillips–Schmidt–Shin test, and both have suggested that the data is stationary.

This means that we don't need to differentiate the data and the d = 0, which means that we will have an ARMA(p,q) model, or ARIMA(p,0,q) essentially.

Then we estimate the p and q values with the Partial autocorrelation function , and see that the optimal parameters range from 1-10.

PACF.png

Then I had to go through all combinations, but stopped the calculation at some time, since it took an awfully lot time time to compute certain values. Remember quant analysis at Wallstreet is done by supercomputers, that go through all possible coefficient permutations. I don't have that kind of resources, so my calculations are limited in accuracy.

I have stopped at ARMA(2,10), which is still a pretty decent model for this dataset. Here are the coefficients if you want to calculate it, with their standard error estimation.

variablecoefficientstandard error
constant0.3435120.0908014
phi_11.590720.0491846
phi_2−0.5979980.0476801
theta_1−1.286570.0494523
theta_20.2522970.0357579
theta_30.1296040.0158901
theta_40.05332780.0140486
theta_5−0.09145080.0150106
theta_6−0.01900100.0137436
theta_70.02564950.0134067
theta_80.06593630.0159579
theta_9−0.01660280.0168858
theta_10−0.05007620.0101774

We see that the that this is a decent model, with very low Mean Absolute Error of 0.39361, much lower than our BTC price forecast:

correl.png

I have also observed that the volume is often 0 BTC, now I don't know if this is an error of Poloniex or that if the volume was that low at certain points, but it's worth noting. Yet, despite this our model still gives us a very low error margin.

I have also observed many indications that GARCH model could be used too. Unfortunately I cannot find my GARCH scripts, so I could not do that test, but given our low error margins, I think ARMA(2,10) is pretty decent.

This is our forecast for the next 5500 periods which would translate to 19 days, where there is a 99% chance that the volume will be in the green box:

99 percent.png

And since we know the volume can't go negative, it will be probably between 0 and 4.9 BTC. Nothing surprising, but I just thought it's worth analyzing it and see how much effect do those big spikes have on the average volume.

It turns out not much, so the big spikes are just random anomalies and the average volume will probably have the boundaries mentioned above.


That's it for this analysis, I hope I can find my additional tools in the future and perform more deep analysis. In my next article I will forecast the STEEM price itself. It will be exciting, stay tuned!

Disclaimer: The information provided on this page might be incorrect. I am not responsible if you lose money from the information you've read on this page! This is not an investment advice, just my opinion and analysis.


Upvote & Follow Me: @profitgenerator

Sort:  

@profitgenerator

Adding the word scientific in front of it doesn't make it science. It is still heavy speculation. gambling. Any model that involves futures is not scientific. Even seismology or metereology are not considered to have scientific models even if themselves are based on scientific principles. The two are vastly different.

For something to be scientific it means that it predicts models over and over again without failure. Even if once the model fails then the theory goes back to the drawing board.

The model is scientific no doubt, the science is called Econometrics, but there is always a margin of error and confidence interval. You can see that the error is present here, both the coefficients have an error margin, as the model itself, we haven't found the optimal parameters, and there is discrepancy between the price and our forecasts.

So yes the model is not 100% accurate, but it's the best I could find.

Econometrics is the application of statistical methods to economic data. So basically it consists of making a mathematical model to predict the value of a variable that depends on certain parameters (the independent variables).

By itself econometrics is not science. The scientific method consists of:
1-Observation
2-Hypotesis (the model)
3-Experiment (in this case if the model accuratley predicts the outcome within a certain statistical level of significance)
4-Theory (Confirmation of the hypothesis).

In this case you can say that you have made a hypothesis but nothing has been proven yet.

Ok I can accept that. But the applicable methodology is still pretty accurate in my opinion.

Wall street boys earn billions of $ daily with HFT applications built on models like these.

So at least the methodology should be considered scientific, if not the model itself that is a result of the methodology. There are different markets, that necessit different models, but the methodology applied to get those models are always the same.

The unit root tests like the ADF and KPSS tests are pretty much standardly used for example.

I agree. I am just saying that the model has to be proven. If the predicted values are accurate enough then you have a model that explains the relationship between the variables to a certain degree. The degree of accuracy does not have to be anywhere near what is considered acceptable for experimental physics (to give an example) since when dealing with economic data there is just to much random noise that can´t be accounted for.

Indeed, this is a timeseries, that is always a sample of the entire population and new values constantly come in, adding extra entropy to the data, and causing the existing models to be less and less acurate over time.

That is why really high level quants only deal with Neural Networks, that can learn automatically and optimize the parameters constantly. But then you also have to find the balance between over-fitting and under-fitting.

Consider another example. Newtonian physics predict very accurately planetary motion even if they are off a few inches from the calculations we do today.

That model though is massively erroneus when trying to navigate through the stars since the small error accumulates exponentially. Same thing applies in your model.

It can give a gray area, no dount. but it is gray enough to call it speculation. not science.

Yes, but that is how science works.

A model that is scientific doesn't mean it's accurate, and a model that is accurate doesn't mean it's scientific.

My model is scientific because it is repeatable, falsifiable, objective, and adheres to the other principles of the scientific method.

Just as you can't call newtonian physics BS, as it still makes airplanes get off the ground. It is not the best model, but it is still aplicable in the modern world,and it works fine.

Exactly. it is all about context. it can work for airplanes. not for navigating sattelites through the solar system.

see my point?

btw. how is your model falsifiable. please explain :)

It is falsifiable if some mathematician comes along and disproves a large part of statistics and probability theory.

It's really based on that, and even if my model doesn't predict it well, I might have just chosen the wrong parameters. Other than that the ARIMA is pretty reliable.

You can find tons of papers glorifying it's accuracy in financial markets and other sciences:

https://arxiv.org/find/all/1/all:+ARIMA/0/1/0/all/0/1?skip=0&query_id=07bf8696050458d1

Also, note that showing that a set of variables are statisically correlated does not necessarily mean that the hypothesis is correct. It could be that the variables are correlated by chance.

It could be but what are the odds? Very low.

A Breusch–Pagan test can be used to determine that, but I don't have a script for that. I just assumed that the probability for that is very very low.

Coin Marketplace

STEEM 0.04
TRX 0.32
JST 0.088
BTC 61578.51
ETH 1697.55
USDT 1.00
SBD 0.38