Forecasting Adventures 1 - Analyzing Previous Forecast
I should have made this into a series since I have been talking about forecasting prices, or for that matter any time series over the last few articles, so if you haven't catched up yet, you should read the previous articles:
- https://steemit.com/bitcoin/@profitgenerator/statistical-analysis-of-btc-usd
- https://steemit.com/bitcoin/@profitgenerator/diy-statistical-analysis-of-btc-usd
- https://steemit.com/mathematics/@profitgenerator/variability-in-a-heteroskedastic-market
- https://steemit.com/mathematics/@profitgenerator/diy-linear-forecasting-with-python
- https://steemit.com/bitcoin/@profitgenerator/diy-linear-forecasting-cryptocurrencies
- https://steemit.com/programming/@profitgenerator/diy-linear-forecasting-cryptocurrencies-with-rolling-1-step-forecast-and-full-backtest
- https://steemit.com/bitcoin/@profitgenerator/diy-full-linear-forecasting-tool-cryptocurrency-prediction
- https://steemit.com/bitcoin/@profitgenerator/2017-10-22-btc-usd-forecast
Where we are now is that I have made a linear prediction 3 days ago for BTC/USD about the price on the 22nd of October, and I have been busy the last 2 days so now I have time to continue this hobby. Linear forecasting is not the proper way to do things around here but it's a simple beginner's tool so why not talk about it.
I will be working with the old dataset for now, since that is where we made our measurements on so that is from 2010-08-18 00:00:00,0.074 to 2017-10-20 00:00:00,5979.45984, 1311 values, originating from Blockchain.info, that is our old dataset obviously until 20th of October, and we will use this for now since this is what we have made our old predictions on.
Comparing Results
So the forecasted value was 5986.521994511905 forecasted in the previous article, the actual value is 5983.184550000001, thus our error was 0.000557648524796 LN, which is much smaller than the average error predicted of 0.03175527080004521 for the 840 period optimum that we used. Okay so far so good!
Compare Backtests against ARIMA(p,d,q) Models
So it was time then to get out the real tools most notably the ARIMA(p,d,q) model that is heavily used in time-series analysis. Unfortunately I don’t have my old python scripts, I had them somewhere on my old hard-disk that I think I left in my old apartment after we moved into our new house with my wife a couple years ago, so I had to use other tools to calculate it, which gave me limited parameters so I had no room to customize things. I did align the ARIMA to work on the same sample of the same size starting from 2015-03-27 at 248.63 which would correspond to the 840 period for the other script.
Then I could have used simple statistical tools to get the parameters of the ARIMA but I almost never use them since I find them very unreliable, I don’t know why other statisticians use them, I just simply check all permutations and then grab the best one with the smallest errors. Otherwise you just introduce another uncertainty in the model, and why would you want to do that.
I have found out that the ARIMA(0,1,1) with constant is the best model for the BTC/USD data. Now the constant can be explained easily since there is a slight up-trend here, the data is not stationary so it needs 1 order of differencing, and the moving average for smoothing.
According to Wikipedia:
An ARIMA(0,1,1) model without constant is a basic exponential smoothing model.
So the price is kind of like exponentially growing (for now). So this is the best model for it, the best naked model nontheless, since we could add seasonality or exogenous parameters to it, but that is for another topic.
Theil’s U value for the ARIMA(0,1,1) model is 1.00353866461468 which means that it’s not exactly informative. If the value is above 1 that means that the expectancy of the forecast is negative, ie. we can’t make money from this forecast since the errors will be bigger than our edge.
We also have the errors for the ARIMA model both in absolute and in RMS and other formats, but I have actually calculated the LN format to be on par with my model, and guess what, the ARIMA error is:
Yes, it is bigger than the error for my linear forecasting method. Who would have thought that the cutting edge model used for forecasting performs worse than a simple linear tool? My simple model is better by 0.0000733308 LN error points, that was surprising.
Interestingly both the LN error and the RMS classical error measures are lower for my linear model.
Now the Theil’s U value for my linear forecast is 0.996454780871044 which means that it’s theoretically predictive and gives a 0.35% edge over randomness. Very very interesting.
Updated Software
So I have updated the software a little bit, now version 2.2, and I have removed the “Theoretical Loss” line since it’s misleading for novices, but I have added Theil’s U calculation and based on that a new Predictive Edge line which will show how much edge does the prediction theoretically have over a random guess. If it’s bigger than 0 then theoretically, over the long term, the prediction should be profitable, if the model is good of course. There are no guarantees, but in theory it should be like that.
Reworked the GUI a little bit, well it’s a console software so don’t expect shiny buttons that’s not the point, it doesn’t distract you with useless stuff, it just gives you exactly what you need. Now it looks like this:
Download Software Here
Disclaimer: The information provided on this page or blog post might be incorrect, inaccurate or incomplete. I am not responsible if you lose money or other valuables using the information on this page or blog post! This page or blog post is not an investment advice, just my opinion and analysis for educational or entertainment purposes.
Sources:
https://pixabay.com
