## Correlation

The correlation measures how strong a coupling is between the noise components of two processes, Xnoise(i) and Ynoise(i). Using a bivariate time series sample, {t(i), x(i), y(i)}™=1, this measure allows to study the relationship between two climate variables, each described by its own climate equation (Eq. 1.2).

Pearson's correlation coefficient (Section 7.1) estimates the degree of the linear relationship. It is one of the most widely used statistical quantities in all branches of the natural sciences. Spearman's correlation coefficient (Section 7.2) estimates the degree of the monotonic relationship. Although clearly less often used, it offers robustness against violations of the Gaussian assumption, as also the Monte Carlo experiments (Section 7.3) show.

Explorative climate data analyses should strongly benefit from correlation estimates that are supported by a CI and not only a P-value of a test of the null hypothesis of no correlation. It is then possible to take several pairs of variables and rank the associations. One finding may be, for example, that global temperature changes are stronger associated to variations of CO2 than to those of solar activity (background material). The challenge of providing accurate CIs is met by pairwise bootstrap resampling (MBB or ARB), which takes into account the serial dependence structures of both climate processes.

A second, rarely mentioned challenge appears when the processes differ in their sampling times (Section 7.5). This book introduces two novel estimators, denoted as binned and synchrony correlation, respectively. These are able (and outperform interpolation) to recover correlation information under the conditions of (1) persistence in the system, which is realistic for climate, and (2) not too large spacings of the time series.

M. Mudelsee, Climate Time Series Analysis, Atmospheric and 285

Oceanographic Sciences Library 42, DOI 10.1007/978-90-481-9482-7_7, © Springer Science+Business Media B.V. 2010

7.1 Pearson's correlation coefficient

Let us assume in this chapter, for simplicity of exposition, that the climate process, X(i), has a constant trend function at level jx, a constant variability, Sx, and no outlier component. In discrete time,

X(i) = Xtrend(i) + Xout(i) + S(i) ■ Xnoise(i) (7 ^

Assume analogously for the second climate process, Y(i), which is on the same time points, T(i), as the first climate process,

The correlation coefficient is then defined as p E [{X(i) - jx} ' {Y(i) - Jy}] (7 3)

The correlation measures the degree of the linear relationship between the variables X and Y; pxY is between —1 ("anti-correlation") and 1.

For convenience of presentation we introduce here the correlation operator,

CORR [X(i), Y(i)] =-COV [X(",Y"" 1/2. (7.4)

The definition of the correlation coefficient is thus based on the assumption of time-constancy of CORR [X(i), Y(i)] = pXY.

Let {X(i),Y(i)}n=i be a bivariate sample (process level). Pearson's (1896) estimator of pXY is rxY = i E (^^M) , (7.5)

where n

and n

are the sample means and

are the sample standard deviations calculated with the denominator n (instead of n — 1). On the sample level, given a bivariate sample {x(i), y(i)}n=1, plug in those values for X(i) and Y(i) in Eqs. (7.5), (7.6), (7.7), (7.8) and (7.9). The estimator txy is called Pearson's correlation coefficient. Also txy is between —1 and 1.

7.1.1 Remark: alternative correlation measures

It is of course possible to employ other estimators. For example, Sn-1 (Eq. 3.19) may replace Sn for estimating Sx or SY, leading to an (unfortunate) correlation estimator that can have values < —1 or > 1. Another option may be to subtract the sample medians (Galton 1888) and not the sample means (Eqs. 7.6 and 7.7). More complex examples arise when time-dependent trend functions are subtracted or time-dependent variability functions used for normalization. Such cases may be relevant for climate time series analysis. All those examples lead to other correlation measures than pxy and other correlation estimators than txy . Their properties and CI performance can in principle be studied in the same manner with Monte Carlo methods. Here we focus on txy, stationary trends and variabilities. Another measure (Spearman's) is analysed in Section 7.2.

7.1.2 Classical confidence intervals, non-persistent processes

Let X(i) and Y(i) both be a stochastic process without persistence or "memory." Let further X(i) and Y(i) both have a Gaussian distributional shape; their joint distribution is then denoted as bivariate normal or binormal distribution (Section 7.1.3.1). The PDF of Pearson's corre-

lation coefficient is then (Fisher 1915):

(1 - pxy)"-"/2 (1 - 4y)'n-4>/2 yï r[(n - 1)/2] r[(n - 2)/2]

Numerous discussions on, and much work in the implementation of, this celebrated formula exist in statistical science. Hotelling (1953) gave approximations for the moments of rxY. In particular, bias rxY

and se rxY

1 +11pXY

n1/2 4n3/2

## Post a comment