## Correlation coefficient

### Definition

Given two random variables *X* and *Y*, Pearson’s correlation coefficient *ρ* is defined as the ratio between the covariance of the two variables and the product of their standard deviations:

where are the means and standard deviations of *X* and *Y*. The correlation coefficient is usually used as a measure of the strength of the linear relation between the two variables. Substituting the values of the covariance and standard deviations computed from sample time series gives the *sample correlation coefficient*, commonly denoted as *r*:

where

Alternatively, *r* can also be written as

### Testing for significance

To test for the significance of the estimated correlation coefficient against the null hypothesis that the true correlation is equal to 0, one can compute the statistic

which has a Student’s *t*-distribution in the null case (zero correlation) with *N-2* degrees of freedom. For instance in Matlab, you can compute the *p*-value using the function *tcdf*(). In particular, *p* = 2**tcdf*(-*abs*(*t*), *N*-2) will give you the *p*-value for a two-tail *t*-test for a given *t*.

Alternatively, one can also convert the correlation coefficient using Fisher transform, given by

approximately follows a normal distribution with mean and standard deviation . With this, a *z*-score can now be defined as

Under the null hypothesis that *r* = *r*_{0} and given the assumption that the sample pairs are independent and identically distributed, *z* follows a bivariate normal distribution. Thus an approximate *p*-value can be obtained from a normal probability table.

One can also use the Fisher transformation to test if two correlations *r*_{1} and *r*_{2} are significantly different by computing the z-score using the formula:

which is distributed approximately as *N*(0,1) when the null hypothesis is true. Here and are the number of samples used to compute and , respectively.

**Incremental algorithm to compute the correlation coefficient**

Define as follows: and The correlation coefficient can now be written as

To estimate the correlation coefficient incrementally, use the following algorithm:

Initialize n = 1:

for n = 2 to N, compute

Re-compute *r* using the above equation

end

## Leave a Reply