The Real Structure of the Market

profitgenerator (68)in #trading • 7 years ago (edited)

I have been working with markets since I know myself, looking at them both from an economics, mathematics or other perspectives, so I have more or less discovered the nature of the market, and how understanding this might be helpful in analyzing and predicting it.

Efficient Market Hypothesis

Now the simplest and most popular theory is the EMH. This says that the market is completely random, meaning unpredictable, and it's essentially just a random walk. So quants are just wasting their times in analyzing the market since it's unpredictable by nature.

Now in my opinion, research and understanding, this is completely false. I'd correct this by saying that it only tends towards efficiency, but it's not efficient all the time, and it might have leaky moments where the price could be completely predictable. Thus efficiency is only a convergence point in the future, but it might never reach it.

Besides there is not perfect efficiency, so there is always a subtle inefficiency that can be exploited in order to predict the market and make money.

Structure of the Market

The market is a heteroskedastic conditional-joint-probability distribution that is a function of time.

Heteroskedastic means that it has a potentially unlimited variance, the variance and the mean is a variable, and it changes in each subset or an undetermined size.
Joint-probability distribution in the sense that it's made up of smaller distributions and conditional meaning that there is some sort of persistent relationship between them, although the price is not autoregressive, I will explain later what I mean by this.
And it's a function of time, since in most probability distributions the order doesn't matter, like in dice rolls. In the market the order does matter.

So it's a time-series. A non-stationary, non-seasonal, non-regressive time series.

Non stationary means that the varance is unbound and the mean is a variable, thus it has a unit root that can be analyzed. All trending series are non-stationary, but not all non-stationary series are trending. The EUR/USD is non-stationary, but it can go both ways, while the Stock Market mostly trends up. Both are non-stationary but only the Stock market is trending.
It can't be seasonal, that would be a huge inefficiency that high-frequency-traders would exploit instantly. So seasonality can be forgotten.
It's also non-regressive if we look at it from the surface with ACF and PACF tools, but there might be another relationship.

Now let's analyze each point one by one:

Autocorrelation

It would be good if the market would be autocorrelated, since then we could just guess the next value from the previous value, but it’s not as simple as that. The ACF and PACF tools show very tiny autocorrelation between datapoints on any market. Even a volatile one like cryptocurrencies. Now this doesn’t mean that there is no relationship between the elements, it’s just that either it’s below the treshold level or it’s obfuscated well.

Think of the market as the output of the information, the input is obfuscated in plain sight, cracking that can lead to an edge but it might be hard to crack it.

For example the motives of large players that move the market might be obvious, it’s just that it’s not quantitatively imbued into the price. You might need external regressors to imbue it into your information set.

Think of it like this let’s say I say the word:
Johnny

This word is a time series of letters as I pronounce it out loudly, since it only makes sense if the letters are in order and the previous letters depend on eachother.

You can guess the word after the letter h and more likely after n, but it’s uncertain after o, it might be Job or Jockey.

Now the same way the market is arranged, except that we only see the obfuscated output of it, like if it were to go through a hash function. The hash of Johnny is 71680c71b244a1b88784e2890016fdb12617d35f20c01d02b25d6861304cff99.

Now try guessing what will come after 680, it’s very hard, since the hashing function does exactly this, it obfuscates the input and the output becomes almost random. Not entirely random since not even cryptographically you can make a black box.

So there will always be some sort of relationship between the output characters in a hashing function, and it could be guessed if you have enough samples.

Now the market is like that, but not that difficult. The obfuscation system is much less complicated, and sometimes the information is all public it’s just that it hasn’t been interpreted yet.

I have found inefficiencies in the EUR/USD market, despite being the most heavily traded market in the world. How is that possible if all quants at big hedgefunds and banks are looking at it day & night?

Simple, the inefficiency is persistent but subtle, and exploiting it doesn’t change it that much since it’s a very liquid market. So it’s nothing like cracking an encryption, the market is much more easy to decypher.

So the autocorrelation might be below threshold if we look at it from default ACF and PACF tools, but add a little bit of regressors, even public ones, and the jungle dis-entangles itself, the relationship becomes much more obvious for any analyst.

Non-Stationarity

Non-Stationarity is a big problem, and it’s nearly impossible to analyze non-stationary data. So you have to transform it to be stationary or at the bare minimum linearly trending, since most of the time there is an exponential element. Or you could design the model for a non-stationary data, but that would complicate it too much. One rule of quantitative analysis is that the simples model is always the most correct.

Thus you’d rather transform the price, than your model. Plus it would be too hard to make a model for a non-stationary series, it’s just easier if it’s transformed and filtered by default, and then the forecasts will be reconstructed afterwards, after the analysis.

Most markets have an exponential element in them, especially crypto markets, they are almost like exponential, so LOG-based transformations can be a good tool.

There are also other advanced tools like Kalman Filters, HP Filters and Fourier Transform others, that transform the price in a more computable format. Again, we can do anything we want with the price under the condition that the tranformation function is injective, and we don’t leak into the future of course.

So the non-stationarity doesn’t necessarily come from randomness, like in a random-walk model, but more like either from exponential elements, which are predictable, or some sort of trending or change of variance in a predictable way.

And that can be easily filtered or transformed so that we can have an easier dataset to work with.

Seasonality

The market can’t be seasonal. Although for example I have found high above treshold autocorrelation in BTC/USD at the 469th lag. Not on the original series but on it’s log difference. And it’s a positive correlation of 0.1498 whereas on lag 1 you only have -0.0736. So that could be the end of a period or a subset, but it’s positive correlation so there is no mean reversion at this period. Seasonality is defined mostly as a periodicity at lower segments like at 10-15 periods which can easily be seen on the ACF chart. But here it’s not so much a periodicity but more like the end of 1 sub-population and the start of another.

Best Model Concept

The best model in concept has to take into account the properties described above. So it’s probably not an autoregressive model, and it has to be applied to a transformed price where the noise & the exponential elements are removed or filtered out.

Normally it should not have a constant, since that signals that our model is not well fitting for the price, but a tiny constant can always be used for correction.

It should have a moving average element as well, but one where the weights take into consideration the ACF plot. So perhaps a moving average with ACF based weights, but the ACF has to be local so that we don’t leak into the future on historical data, so it must be recalculated at every datapoint and the parameters fitted to that.

I don’t know, I have actually tried to create an ACF based indicator in the past and just using the past 10 lag’s autocorrelation as weights I didn’t got good results. Perhaps it should have been a rank correlation tool like Spearman fitted per subset not per element to leave room for errors.

Anyway there are many leads that can be explored, if there is any inefficiency out there, then the statistical tools should be able to detect it.

But from experience it looks like gaining inspiration from regressors is much more accurate than just using the price alone. Of course we only use the past values as always to not leak into the future.

But the relationship between PRICE_k-1 and REGRESSOR_k-1 is always better than between PRICE_k-1 and PRICE_k-2 in order to determine PRICE_k as the forecasted price.

Now with regressors there are plenty of regressors to analyze from, most of them are public, like the Blockchain.info stats for BTC/USD or EURUSD futures and EURUSD spot markets relationships, or like using the Dollar Index as a regressor for CNY/JPY markets and things like that. The rule of thumb when using other markets is that the big market is the regressor and the small market is the main data, since the small one is always less efficient.

Or with real estate is even better there you can use public statistics like rent/employment/credit and other public data. Real estate is a very inefficient market so it’s easy to predict things there.

Sources:
https://www.pexels.com