## Hurst II

On another post we explained how to calculate the Hurst exponent using two different methods.

Now we implement it.

To test the method we’ll use a time series from the Brazilian market: the mini future of the IBovespa index, hourly, from January to April, 2019. It’s available here.

This choice is quite arbitrary, but we know that this time series *is* mean reverting.

How do we know that? A simple ADF test:

```
import statsmodels.api as stat
import statsmodels.tsa.stattools as ts
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
win = pd.read_csv('WIN_H1_01_04.csv', sep=';')
plt.plot(win['CLOSE'])
results=ts.adfuller(win['CLOSE'], maxlag=1, regression='c', autolag=None)
print(results)
```

The results reads:

```
(-3.985491197671518, 0.0014884183818247635, 1, 794, {‚1%‘: -3.4386126789104074, ‚5%‘: -2.865186972298872, ’10%’: -2.5687119871327146})
```

That means, for the series to be mean reverting within 10% certainty, the first parameter should be more negative than `~ -2.67`

, and so forth for 5% and 1%.

So we know it is mean reverting and we cant test our implementations safely.

### Calculating the Hurst exponent

The strategy is the following: calculate `Var(t)`

for many `t`

s and take the log on the equation `Var(t) = c*t^{2H}`

. We should then have an affine equation and we are able to determine the linear coefficient, which is `2H`

.

So, to calculate the Hurst exponent the first thing to do is we have to generate a set `T`

of lags `t`

. They should be between two and the length of the time series `z`

. So we use numpy’s `arange`

which generates a set of lags between 1 and a fraction (which we choose) of the length of `z`

:

```
price = np.log(win.CLOSE)
T = np.arange(len(price)/50).astype(int)
```

The next step is to calculate the `Var(t)`

. To be extra didactic I’ll make an explicit calculation for a particular `t in T`

: first we have to calculate the following list (defining `price = z`

)

```
|z(t) - z(0)|
|z(t+1) - z(1)|
|z(t+2) - z(2)|
…
|z(t+N) - z(N)|
```

And for this set we calculate the variance. We repeat the process for every `t in T`

, thus obtaining a set `{Var(t)}_{t in T}`

.

So the best tool to do this is to use *panda’s* diff function (not numpy’s) with the argument `t in T`

.

So we define `y`

as follows, which is already the log of `Var(t)`

:

```
y = [np.log(price.diff(t).abs().var()) for t in T]
```

This is it. The rest of the code is merely the mechanics to get `H`

, but the reasoning is over.

```
y = np.array(y)
x = np.log(T)
x = x[np.isfinite(x)]
y = y[np.isfinite(y)]
result = np.polyfit(x,y,1)
H = result[0]/2
```

The result we get is `H ≈ 0.43`

, which agrees with what we know about the series: it is *quite* mean reverting.

The jupyter’s notebook of this little project can be found here.