Tuesday, August 17, 2010

R Implementation of the Ornstein-Uhlenbeck Formula



Orenstein-Uhlenbeck formula for modeling the mean reversion of a time series:




dz(t) = -theta(z(t) - u) + dW

where z(t) is the spead, u is the mean of the spread, and dW is some random noise.
The half life of the mean reversion can then be calculated as

ln(2)/theta.

The Matlab implementation:

prez = backshift(1, z); % z at previous time step
dz = z - prevz;
dz(1) = [];
prez(1) = [];
results = ols(dz, prevz-mean(prez));
theta = results.beta;
halflife = -log(2)/theta;)

R implementation:

prev_sprd <- c(sprd[2:length(sprd)], 0);
d_sprd <- sprd - prev_sprd;
prev_sprd_mean <- prev_sprd - mean(prev_sprd);
sprd.zoo <- merge(d_sprd, prev_sprd_mean);
sprd_t <- as.data.frame(sprd.zoo)
result <- lm(d_sprd ~ prev_sprd_mean, data = sprd_t, silent = TRUE);
half_life <- -log(2)/coef(result)[1];

5 comments:

  1. Patrick,

    Thanks so much for posting this formula. I've been trying to find it for some time to implement the example in Dr. Chan's book.

    However, when I run the code on my sprd for the GLD/GDX example 7.2, I'm coming up with a half-life of -61.65 instead of 10.003.

    Any thoughts would be greatly appreciated!

    ReplyDelete
  2. Good question. I'll investigate.

    ReplyDelete
  3. Thanks so much. I've been playing around with it in R a bunch, but I still can't get it to work. I'll let you know if I figure anything out.

    Thanks again!!!

    ReplyDelete
  4. Julia,

    Thanks for checking my code! I re-wrote a couple of things. Note that in R, if the equation is specified as result = lm(d_sprd ~ prev_sprd_mean), the ols solution is assumed to have a y-intercept. The result of coef(result)[1] is the y intercept rather than the beta which is is in coef(result)[2]. As an alternative, you can write the equation as (d_sprd ~ prev_sprd_mean + 0) to force a zero y-intercept, so the beta ends up in coef(result)[1].

    After changing the Yahoo historical prices URL to the following http://ichart.finance.yahoo.com/table.csv?s=GLD&a=04&b=23&c=2006&d=11&e=21&f=2007&g=d&ingore=.csv to get the GLD prices from 05/23/2006 to 12/21/2007 as reflected in Dr. Chan's example 7.2 and also doing the same for GDX, the R half life result is 8.82 days rather than -61.65 days which is an obvious mistake that I should have spotted. Again, thanks for checking. I'm very glad that I posted this example. I'll continue to investigate why MatLab gave 10.0003 days of half life and R gave 8.82 days of half life.

    As a side note, please note that the ichart.finance.yahoo.com parameter for starting month &a and ending month &d starts at 0 for January and ends at 11 for December. This is different from what is specified in some of the examples found on the Web such as this article here http://www.etraderzone.com/free-scripts/47-historical-quotes-yahoo.html.~

    ReplyDelete
  5. I realized that I may have buried the code in my last reply, so the half life implementation should be written either as

    result = lm(d_sprd ~ prev_sprd_mean, ...
    half_life = -log(2)/coef(result)[2]

    or

    result = lm(d_sprd ~ prev_sprd_mean + 0, ...
    half_life = -log(2)/coef(result)[1]

    ReplyDelete