Data analysis on corona spread in China: 1/22-2/08
I found data on new confirmed cases of the corona virus. Data on china seems to be sufficiently big to do some basic data analysis. Predicting the evolution of the virus spread using regression seems meaningful. Here is the main result I got with a little bit of python:
The red line is a quadratic polynomial that has been fitted to the data using regression. It is quite surprising that a quadratic function can be fitted so well to the data. We are at the part of the quadratic function where it keeps increasing so hopefully it will behave more like a cubic function in the future. The bounds, the orange and green lines, are predictors for how much the data will most likely be off from the red line. There is a neat technical method for how I came up with this, see the technical section below.
If you want to use the graph or want the script let me know :o)
Given the quadratic function obtained through regression you can compute the (absolute) error. It appears that after ordering the error behaves like a linear function. Therefore, the average error is a good predictor for error bounds. You can pull the error study to the domain of the quadratic function since on the domain corresponding to the range we are interested in the quadratic function is strictly increasing.
Data on Github: https://github.com/CryptoKass/ncov-data
I haven't promoted this in ages but let's give it a try again. There is a MathOwl shop which sells my artsy fartsy stuff. If you got some spare monies head over there. Many thanks to suesa and terrylovejoy for being my customers. Those peeps are hootiful.