## Pxy x PX y Py y Py2

SX SY

pX and pY is the mean, SX and SY the variance of the univariate processes X(i) and Y(i), respectively; pXY = pE is the correlation coefficient. The PDF is "slanted" for pxy = 0 (Fig. 7.9). See Priestley (1981: Section 2.12.9 therein), Patel and Read (1996: Chapter 9 therein) and Kotz et al. (2000: Chapter 46 therein) for more details on the binormal distribution.

The bivariate lognormal distribution is in the more general case, with shape parameters oX and oy and scale parameters bX and bY, given by

X(i) = exp ox ■ E^o, 1)(i) + ln(bx) , Y(i) = exp oy -EN(0, 1)(i)+ln(by) , see also Section 3.9. It has the correlation (Mostafa and Mahmoud 1964)

exp (ox ■ oy ■ Pe) - 1 I ^ [exp (oX) - 1] [exp (oY) - 1]

Figure 7.9. Binormal probability density function: contour lines and marginal distributions. Parameters: pX = 5,SX = 1,pY = 4,SY = 3 and (a) pXY = 0 or (b) pXY = 0.6.

2345678 2345678

Figure 7.9. Binormal probability density function: contour lines and marginal distributions. Parameters: pX = 5,SX = 1,pY = 4,SY = 3 and (a) pXY = 0 or (b) pXY = 0.6.

and the PDF

x exp

The bivariate AR(1) process for uneven time spacing is given by

X (7.53) + EN(0, 1—exp{-2[T(i)—T(i—1)]/tx})(i), i = 2, . . . ,n,

+ EN(0, 1—exp{—2[T(i)—T(i— 1)]/TY})(i), i = 2, . . . ,n, where the white-noise innovation terms are correlated as

CORR

CORR

= (1 - exp {- [T(i) - T(i - 1)] ■ (1/tx + 1/ty)} )

x - exp {-2 [T(i) - T(i - 1)] /ty }j P£, i = 2,...,n,

CORR

CORR

CORR

= 0, i = 2, . . . , n. This process is strictly stationary. Its properties are

E [X(i)]= E [Y(i)]=0, VAR [X (i)] = VAR [Y (i)] = 1

Bias and standard error of Pearson's correlation coefficient for distributions of X(i) or Y(i) that deviate from the Gaussian shape can theoretically be approximated using the parameters (cumulants of higher order) describing the deviations. Lengthy approximation formulas were given by Gayen (1951) and Nakagawa and Niki (1992). The relevance of the formulas for practical climatological purposes seems to be limited because of the considerable uncertainties in the estimation of those cumulants from data sets limited in size.

An alternative to Fisher's transformation is (Hotelling 1953)

For small n, Hotelling's zH is in distribution closer to a Gaussian shape than Fisher's z (Rodriguez 1982).

Spearman's rank correlation coefficient is reviewed by Pirie (1988). Fisher's z-transformation and usage of the normal distribution is not the only method for constructing classical, approximate CIs for rs. Kraemer (1974) suggested an alternative transformation and usage of Student's t distribution. Otten (1973) gave for the null case ps = 0 the exact PDF of rs for n = 13 to 16. Franklin (1988) examined the convergence of the exact null distribution of rs to normality for n = 9 to 18. As regards the originator of rs, Pearson (1924: p. 393 therein) thinks that there is "sufficient evidence that Galton dealt with the correlation of ranks before he even reached the correlation of variates, and the claim that it is a contribution of the psychologists [i.e., Spearman] some thirty or forty years later to the conception of correlation does not seem to me valid."

The grade correlation coefficient between two continuous variables X and Y is (Gibbons and Chakraborti 2003: Section 11 therein)

Herein, Fx(x) and Fy(y) are the (marginal) distribution functions and f (x, y) is the bivariate PDF. The case of a binormal PDF (Eq. 7.49) with correlation coefficient pXY = pE can be analytically solved (Pearson 1907):

The case of a bivariate lognormal PDF (Eq. 7.52), where pxy is related to pE via Eq. (7.51), was solved (Table 7.8) by means of sim ulations. Consider the normal distribution function (Eq. 3.49), denoted as FN(x). Consider further the lognormal distribution function, FLN(x) = f^ /LN(x')dx'. Herein, /LN(x) is the lognormal PDF (Eq. 3.61). Then, Fln(x) = FN(ln(x)) for x > 0.

Table 7.8. Grade correlation coefficient, bivariate lognormal distribution. ps was determined from its definition (Eq. 7.59) by drawing random bivariate numbers from the density (Eq. 7.52) and calculating the average and its standard error over nsim = 1,000,000,000 simulations. Lognormal parameters: = = 0.0 and <7X = 0Y = 1.0.

Table 7.8. Grade correlation coefficient, bivariate lognormal distribution. ps was determined from its definition (Eq. 7.59) by drawing random bivariate numbers from the density (Eq. 7.52) and calculating the average and its standard error over nsim = 1,000,000,000 simulations. Lognormal parameters: = = 0.0 and <7X = 0Y = 1.0.

 pS Accuracyb of ps p£ 0.3 < 10-4 0.3129 0.8 < 10-4 0.8135

a Average over nsim simulations. b Standard error over nsim simulations.

a Average over nsim simulations. b Standard error over nsim simulations.

The point biserial correlation coefficient can be used as an estimator of the degree of the linear relationship between a continuous variable, X(i), and a dichotomous (binary) variable, Y(i). A field for climatological applications is the analysis of outliers or climate extremes (Chapter 6), where, for example, Y(i) = 0 means the absence and Y(i) = 1 the occurrence of an extreme at time T(i). Let (on the sample level) p denote the proportion of y(i) values equal to 0; q = 1 — p; S0 and Xi be the mean x(i) value with y(i) = 0 and 1, respectively; and sn,X be the sample analogue of the standard deviation estimator (Eq. 7.8). The point biserial correlation coefficient is then defined (Kraemer 1982) as rpb = (pq)1/2 (Si — So) /sn,x . (7.61)

It readily follows that rpb = rXY. It may be shown (Tate 1954) that if (1) pXY = 0 and (2) the standard deviation of X(i) is independent of whether Y(i) equals 1 or 0, the statistic tpb = (n — 2)1/2rpb (1 — rPb)-1/2 (7.62)

is distributed as Student's t with n — 2 degrees of freedom (Section 3.9). This statistic was used by Mudelsee et al. (2004) to study whether a relation exists between atmospheric variables (sea-level pressure, geopo-tential height) and the occurrence of Elbe floods for the interval from 1658 to 1999. Because of the persistence of the processes that generated the atmospheric time series, Eq. (7.62) was adapted by replacing n with the effective data size (determined as 0.85 n to 0.90 n). Other clima-tological examples of usage of rpb are the following. Ruiz and Vargas (1998) study the relation between an atmospheric variable (vorticity) and the occurrence of large rainfall at South American stations, interval 1983-1987, and Giaiotti and Stel (2001) relate thunderstorm occurrence to geopotential height in northeast Italy, interval 1998-1999. A caveat that applies to the interpretation of the results from both studies is that persistence was ignored in the analyses. Bootstrap CIs for rpb were studied by Sievers (1996), who found good coverage performance of calibrated CIs already for small data sizes (n = 10).

Kendall's tau, employed in Section 4.4 (p. 168) for trend testing, can also be used (and this was historically earlier) as a correlation measure. For trend testing, we count the number of interchanges to bring {X(¿)}™=i into the same (monotone) order as {¿}™=i; for correlation estimation, we have to bring {X(i)}™=1 into the same order as {Y(i)}™=1. Hamed (2009a) presented adaptions of the statistical test of H0: "zero correlation" to take into account serial dependences (short- and long-term).

The Monte Carlo performance of bootstrap CIs for correlation coefficients was studied by the following. Hall et al. (1989) found that one loop of calibration of the percentile CI for rxY brings a dramatic increase in coverage accuracy for bivariate lognormal white noise and small data sizes (n = 8,10,12). Sievers (1996) confirmed this finding for n = 19 and eight types of the distributional shape of the white noise. Above studies used ordinary bootstrap resampling because of the absence of serial dependence. Mudelsee (2003) analysed bootstrap BCa CIs for rXY on bivariate Gaussian and lognormal AR(1) processes with n between 10 and 1000. He used pairwise-MBB resampling and concluded that acceptable levels of coverage accuracy can be achieved but that the serial dependence reduces the effective data size to a considerable degree. Two caveats against this study are that block length selection was done in an ad-hoc manner (Eq. 7.30) and that the studied process was not identical to the strictly stationary model of Eq. (7.53). The observation that nonzero persistence has detrimental effects (larger bias and RMSE) of correlation estimators was also quantified by Park and Lee (2001), who analysed rs on bivariate Gaussian AR(1) processes with n = 137. These authors tried several resampling methods in combination with a brute-force block length selection (in terms of RMSE of the standard deviation of rs). One of their conclusions is that pairwise-ARB resampling performed better than a pairwise version of the nonparamet-ric SB resampling (Section 3.8). Papers from the psychology literature report about coverage performances of bootstrap CIs for quantities that are related to rxY and are of relevance to that branch of investigation, namely (1) correlation coefficients that account for range restriction or censoring of one variable (Chan and Chan 2004) and (2) the difference of correlation coefficients in overlapping data sequences (Zou 2007).

The Monte Carlo performance of bootstrap hypothesis tests about correlation coefficients was studied by the following. Martin (2007) considered Ho: "pXY = pXY" with nonzero pXY. This constitutes an important test case, not only for the climate sciences, because it does not consider the "straw man" Ho: "pXY = 0," but instead a more realistic Ho. Such a test may supply a quality of information similar to that of a CI (Section 3.6), with additional information regarding the test power. To resample under the null of nonzero pxy , a "rotated" version of the array of the original bivariate sample, (y(i)}™=1 versus (x(i)}™=1, is used (Beasley et al. 2007: Fig. 1 therein). The two cited papers study the empirical significances and powers for bivariate white noise with a range of data sizes (from 10 up to 100), various distributional shapes and several pXY values (-0.5,0,0.4,0.8). Belaire-Franch and Contreras-Bayarri (2002) performed an analogous experiment of the empirical test significances and powers, employing AR(1) and MA(1) models of serial dependence and using SB resampling. Summarizing the results of the above mentioned Monte Carlo experiments, we conclude that testing realistic null hypotheses about pXY can be accurately performed using bootstrap resampling. Regarding the test of Ho: "pXY = 0," Ebisuzaki

(1997) studied the frequency-domain bootstrap (Section 5.3) and the classical approach (via n^) using even time spacing and bivariate AR(1) and AR(2) models with n = 8, 16, 32 and 64. He found the bootstrap variant to produce acceptably small deviations between nominal and empirical rejection rates, not only for the AR(1) but also the AR(2) model. Larger deviations occurred for small n and the AR(2) parameter approaching with a2 ^ -1 the boundary of the stationarity regime (Fig. 2.4). Ebisuzaki (1997) ascribed this deficit to the poor properties of the periodogram as spectrum estimator (Chapter 5). Pyper and Peterman

(1998) did not study the bootstrap but rather the classical approach (via np) to take into account serial dependence. They explored various persistence models (AR(1), AR(2) and ARIMA), sample sizes (between 15 and 50), autocorrelation estimators and the effects of smoothing prior to the correlation estimation. Prior smoothing constitutes a special case of an alternative correlation measure (Section 7.1.1). One of their conclusions is that prior smoothing may reduce considerably the effective data size and lead to a reduced power of the statistical test, see also the paragraph here on the Sun-climate relationship.

Binned and synchrony correlation coefficients seem to be novel estimation tools applicable to the case of unequal timescales. Davison and Hinkley (1997: Example 3.12 therein) consider an example from a closely related case, where some values (of, say, Y(i)) are missing. They consider the imputation of the missing values "to obtain a suitable bi-variate F [estimate of the distribution function], next estimate d [i.e., pXY] with the usual sample correlation t(F) [i.e., rXY], and then resample appropriately" (Davison and Hinkley 1997: p. 90 therein). An imputation method is to make a regression of X(i) on Y(i) (Chapter 8) using the bivariate subsample without the missing values. The assumption here is that the values are missing at random (Rubin 1976), that means, for example, that no range restriction or censoring occurred. Algorithms for imputing missing data for general estimation purposes were presented by Dempster et al. (1977) and Efron (1994). With regard to the temporal spacing, we have a focus on the general case of irregularity and not on the case of missing observations from an evenly spaced grid. This is why we almost exclusively do not consider imputation. In the bivariate setting, however, imputation may be an interesting estimation alternative. We note that when X and Y have no common time points, imputation is not straightforward to implement, whereas binned and synchrony correlation are so and may (if persistence exists and the time points of X and Y are well "mixed") help to recover information about the underlying correlation.

Timescale uncertainty was also identified by Haam and Huybers (2010) as a problem affecting the estimation of the relationship between two processes. These authors selected the covariance measure, assumed even spacing and allowed only one of the two processes to be influenced by timescale errors. Furthermore, the timescale errors were assumed to take discrete values only. For this simplified setting, they obtained analytical and numerical results on the distribution of the maximum of the covariance. This was in turn used as a measure of the significance of the empirical covariance. Finally, Haam and Huybers (2010) used this test to study the relation between variations of ¿18O in a stalagmite (with timescale errors) and atmospheric radiocarbon content during the Holocene. They were unable to reject the null hypothesis of zero covariance. This result should be assessed with some caution because prior to the analysis the series had been interpolated to achieve even spacing.

The smoothed bootstrap consists of adding an amount of (normally distributed) noise to resampled values, x*(i) and y*(i). The idea (Efron 1982) is to circumvent the discrete distribution of the bootstrap samples, which may lead for quantities such as the sample median to a bad performance (Davison and Hinkley 1997: Section 3.4 therein). A Monte Carlo study (Silverman and Young 1987) of the RMSE of the sample standard deviation of rYY and z demonstrated the superiority of smoothing, especially for z and small data sizes (n < 50). Young (1988) gave a rule for adjusting the amount of smoothing. It may be that the coverage of bootstrap CIs for correlation estimators could benefit from smoothing. However, more theoretical knowledge on the application of the smoothed bootstrap to time series from serially dependent bivariate processes would be helpful.

Climatological applications of bootstrap CIs for Pearson's rxY include the following. Kumar et al. (1999) used MBB resampling and percentile CIs to study the "weakening relationship between the Indian monsoon and ENSO" during the interval from 1856 to 1997. They took a running window of length 21 years and determined the correlation using the points of all-Indian summer monsoon rainfall (June to September average) and equatorial Pacific sea-surface temperature anomalies (June to August average) within the window. The obtained correlation confidence band is pointwise. Such "running correlations" are often used in explorative climatology, despite the absence of a theoretical framework for nonconstant CORR [X(i),Y(i)]. Girardin et al. (2006a) used pairwise-MBB resampling and BCa CIs (PearsonT software, Section 7.7) to find a highly significant correlation between (a transformation of) the Pacific sea-surface temperature and west-east atmospheric flow over Canada during the past approximately 150 years. Boessenkool et al. (2007) used the same method to relate the (proxy-derived) speed of the water flow near the ocean bottom at the Iceland-Scotland ridge with the NAO index, interval 1885-2004. They found that a positive index (stronger sea-level pressure gradient) had reduced the water flow; the value of rYY = -0.42 with 95% CI [-0.60; -0.20] serves to quantify the amount of covariation. This finding has implications for our knowledge about the meridional overturning of the Atlantic in response to climate change—a currently debated point in scientific discussions. Prior to correlation estimation, the NAO time series had been pre-processed by filtering and interpolating the record (unequal times) to the time points of the flow record (d = 2.2 a); the alternative method would have been the binning procedure (Section 7.5.1). Rothlisberger et al. (2008) also used PearsonT to study the coupling between variations of Antarctic temperature and sea ice extent in the ice-age climate over the past 800 ka. The proxy information, about temperature from ¿D and about sea ice from the flux of seasalt Na (Fig. 1.5), comes from EPICA Dome C, that is, the ice core with the currently longest time span of climate information. The finding was that during mild climate stages the cor relation is strong, while during cold glacial conditions it is weaker (but still significant). Mudelsee (2003), introducing PearsonT, re-assessed the Sun-monsoon relation on Holocene timescales documented by ¿18O variations measured in a stalagmite from Oman (Neff et al. 2001). He chose the interpolation (unequal times) instead of the more appropriate binned or synchrony methods. He showed that the tuning of the (t(i)} of the monsoon proxy changed a nonsignificant correlation into a significant value, emphasizing, however, that the size of the time shifts of the tuning was smaller than the dating errors.

CT C

CO CO

i—I—I—I—I—I—I—I—I—I—I—I—I—r

1850 1900 1950 2000

co co

CD CD

cp cp

CD CD

1—I—I—I—I—I—I—I—I—I—I—I—I—I—I

12 11 10 9 Cycle length (a)

Figure 7.10. Solar cycle length and northern hemisphere land surface-air temperature anomalies, 1866-1985. a Time series (smoothed) of cycle length (open symbols) and temperature anomaly (filled symbols); b scatterplot between cycle length and temperature anomalies; c lag-1 scatterplots, standardized cycle length (filled symbols) and standardized cycle-length residuals (open symbols); d lag-1 scatterplots, standardized temperature anomalies (filled symbols) and standardized temperature-anomaly residuals (open symbols). (The time series are digitized values from Friis-Christensen and Lassen (1991: Fig. 2 therein).)

Climatological applications of hypothesis tests with autocorrelation adjustment include the following. Rothman (2002) examined the correlation between X(i): strontium isotopic ratio and Y(j): isotopic fractionation between total organic carbon and sedimentary carbonates over the past 500 Ma. Both variables (nY = 48, nY = 46) were measured on independent samples of marine sedimentary rocks and have therefore independent timescales. The objective of the correlation analysis was to derive a proxy for variations of atmospheric CO2 concentration over such long geological periods. The author interpolated the X(i) values to the Ty(j) times and calculated Spearman's rank correlation coefficient as rS = -0.4. He used the frequency-domain bootstrap (Section 5.3) to take autocorrelation effects into account (Rothman 2001) and determined a one-sided P-value of 0.17. The accuracy of the P-value may be influenced by the following factors: (1) prior smoothing of X(i) had been applied, (2) possibly a second interpolation (to equidistance) for calculating the periodogram (Chapter 5) was necessary, (3) the data size is limited and (4) the unequal timescales and the interpolation may have introduced a negative bias into rS (Section 7.5). The net effect is not clear: while factor (1) would tend to let P increase, factor (4) would let P decrease and, hence, raise the level of confidence. Pyper and Pe-terman (1998) used their approach via n^ to test H0: "pYY = 0" for bivariate samples of the survival rate of different stocks of salmon from a bay in Alaska (time interval from 1957 to 1989, annual resolution). Edwards and Richardson (2004) study with the same approach the relation between the interannual variation in the timing of the seasonal cycle for various functional groups (e.g., diatoms or dinoflagellates) and sea-surface temperature, time interval 1958-2002. They find significant correlations, which underline the impact of climate change on marine pelagic phenology.

Causality is not the same as correlation. However, that philosophical concept (Chapter 1) of the association between an action (variable X) and a reaction (variable Y) should require the time arrow and be related to the theme of this chapter. Overviews of this relation have been published in statistics (Barnard 1982; Glymour 1998) and physics literature (Palus and Vejmelka 2007). One quantitative formulation of the concept of causality comes from information theory and uses the idea of predictability: "We say that Y(i) is causing X(i) if we are better able to predict X(i) using all available information than if the information apart from Y(i) had been used" (Granger 1969: p. 428 therein). Inference about this "Granger causality" requires the analysis of time series, {t(i), x(i), y(i)}n= 1, and may employ statistical models, linear or nonlinear, possibly with a time-lag parameter (Granger and Lin 1994; Stern and Kaufmann 2000; Triacca 2007), see also Chapter 8. Climatological examples are the following. A bivariate linear regression model was fitted to temperature time series from the northern (X(i)) and the southern (Y(i)) hemisphere, covering the interval from 1865 to 1964 (Kaufmann and Stern 1997; Stern and Kaufmann 1999). From the estimated time lag between the two variables, the authors concluded the existence of a south-to-north causal order "generated by anthropogenic activities that increase the concentration of greenhouse gases globally, but which increase the concentration and effects of sulphate aerosols mainly in the Northern Hemisphere" (Kaufmann and Stern 1997: p. 42 therein). This conclusion was criticized as inconclusive by Triacca (2001), who preferred the direct demonstration of Granger causality of CO2 changes on temperature changes; that, however, had been done by Tol and de Vos (1998) using a linear regression model with a prescribed lag. This simple model type has also been utilized for demonstrating an ocean feedback (daily wintertime sea-surface temperature) on the NAO, performed (Mosedale et al. 2006) using a 50-year long simulation from the climate model HadCM3 (Fig. 1.9). More advanced, nonlinear descriptions result from employing the mutual information (Fraser and Swinney 1986; Granger and Lin 1994),

Assuming that the logarithm is taken to the base of two, /XY quantifies how many bits of information about X can be predicted on the basis of a sample of Y. The concept of mutual information has been extended to higher dimensions and related to properties of chaotic systems (Prichard and Theiler 1995). One such extension, which is called generalized redundancy, was employed (Diks and Mudelsee 2000) to study causal relations between variables of the Plio- to Pleistocene climate. The P-value of the test of the null hypothesis "zero information content" (no Granger causality) was determined using SB resampling (Diks and DeGoede 2001). One finding (Diks and Mudelsee 2000), from interpolated series, was that changes of ¿18O (a proxy for ice volume) do Granger cause changes of ¿13C (a proxy for the strength of formation of North Atlantic Deep Water), and that this coupling did increase towards the late Pleistocene. Other information-theoretic measures can be applied when three variables, X, Y and Z, are available; an analysis of data covering the past 400 years found that solar activity variations seem to "account for a smaller-scale behavior of global temperatures than greenhouse gases" (Verdes 2005: p. 026222-7 therein). A recent review of causality detection using information-theoretic methods (Hlavackova-Schindler et al. 2007) gives more examples from climatology. Barnard (1982: p. 387 therein) notes also that "causation does not necessarily imply correlation as the latter is usually measured." He gives the simple nonlinear model,

with X(i) uniformly distributed over [—1;+1], where also Y(i) varies between —1 and +1; this model has pXY = 0. The design of suitable dependence measures for nonlinear processes, alternatives to Pearson's or Spearman's linear measures, has something of an art. Granger et al. (2004: pp. 651-652 therein) propose that a measure should have the following properties:

1. It is well defined for both continuous and discrete variables.

2. It is normalized to zero if X and Y are independent, and lies between 0 and +1.

3. The modulus of the measure is equal to unity (or a maximum) if there is a measurable exact (nonlinear) relationship, Y = m(X) say, between the random variables.

4. It is equal to or has a simple relationship with the (linear) correlation coefficient in the case of a bivariate normal distribution.

5. It is metric, i.e., it is a true measure of 'distance' and not just of divergence.

6. The measure is invariant under continuous and strictly increasing transformations This is useful since X and Y are independent if and only if ^(X) and ^(Y) are independent. Invariance is important since otherwise clever or inadvertent transformations would produce different levels of dependence."

Granger et al. (2004) studied several dependence measures for many nonlinear models by means of Monte Carlo simulations.

7.7 Technical issues

The variance of Spearman's rank correlation coefficient is for binormal processes (David and Mallows 1961: Eq. (Z) therein)

1 36

x n3 (-0.42863279pYY + 0.08354697pYY + 0.04257246pYY + 0.01687474pYY + 0.00664071pXY + 0.00270655pXY + n2 (0.1551301pYY - 0.057362293pYY - 0.18443407pYY

- 0.02271732pYY + 0.00757524pXY + 0.01329883pXY + n(0.36837259pYY + 0.44738882pYY - 0.08427574pYY

- 0.27929901pYY - 0.19943375pXY - 0.1386106pY2Y + 0.07179677pYY + 0.06467162pYY + 0.21015257pYY + 0.28589798pYY + 0.31704425pXY + 0.07923733pXY

PearsonT (Mudelsee 2003) is a Fortran 90 program for calculating rYY with BCa CI from pairwise-MBB resampling. The software is available at the web site for this book.