[1 aX 1 aY12

This model requires the autocorrelation parameters ax and aY to have the same sign.

The bivariate AR(1) process for even spacing (Eq. 7.25) is strictly stationary. Its properties are

The bivariate AR(1) process for uneven time spacing is obtained in the usual manner: replace aX by exp{—[T(i) — T(i — 1)]/tx} and aY by exp{—[T(i) — T(i — 1)]/ty}. This leads to heteroscedastic innovation terms, as already noticed in the univariate case. The model is given in the background material (Eq. 7.53).

To summarize, the simple bivariate Gaussian AR(1) model, written for even (Eq. 7.25) or uneven spacing, has three parameters. Two describe the persistence properties of the processes X(i) and Y(i), one describes the correlation between both. The more general case in form of means unequal to zero and variances unequal to unity, is less relevant in the context of this chapter because the correlation estimation (Eq. 7.5) eliminates such effects.

Interesting is, however, the general formulation of the bivariate AR(1) model, where the variable at a time, say, X(i), depends not only on its own immediate past, X(i — 1), but also on the past of the second variable, Y(i — 1). This general model has more than three parameters, and it can give rise to "identifiability" problems (Priestley 1981: Section 9.4 therein). These difficulties how to uniquely determine the number of parameters and their values do certainly not decrease when considering uneven instead of even spacing, see the univariate embedding problem (Section 2.1.2.1). We therefore ignore the general formulation and avoid the identifiability and embedding problems when describing parametric bootstrap resampling (Sections 7.1.5.2 and 7.2.3.2) and designing Monte Carlo experiments (Section 7.3). A further conclusion is that the possible existence, and theoretical applicability to climatology, of the general bivariate AR(1) model with more than three parameters supports the selection of the nonparametric bootstrap resampling algorithms (Sections 7.1.5.1 and 7.2.3.1).

7.1.4 Classical confidence intervals, persistent processes

If we continue from before the excursion on bivariate stochastic processes and drop not only the Gaussian assumption about the distributional shape of X(i) and Y(i), but also leave away the assumption that

Step 1

Bivariate time series

{t(i),x(i),y(i)}?=1

Step 2

Pearson's rxy (Eq. 7.5)

Step 3

Fisher's z-transformation

z

= tanh 1 (rxy )

Step 4

Estimated, bias-corrected

persistence time, process X(i),

TX

using mean-detrended

time series, {x(i) — x}™^

Step 5

Analogously, process Y (i)

T^

Step 6

Estimated, bias-corrected

equivalent autocorrelation

coefficient, process X(i)

â'x = exp (- d/rX)

Step 7

Analogously, process Y (i)

ay = exp (—d/Ty)

Step 8

Effective data size,

n

p

obtained by plugging in

aX for ax and ay for ay

in Eq. (2.38)

Step 9

Approximate, classical

normal CI for rxy,

CIrXY ,1-2a —

obtained from

tanh + z(a) • (n'p

re-transforming z

tanh z — z(a) • (n'p

Algorithm 7.1. Construction of classical confidence intervals for Pearson's correlation coefficient, bivariate AR(1) model. Steps 3 and 9: z is Fisher's transformed correlation, z(a) is the percentage point of the normal distribution.

these processes are persistence-free white noise, we approach the reality for the majority of processes occurring in the climate system. A classical CI for rxY, which is approximate, is obtained readily by invoking the effective data size (Chapter 2). The complete procedure is given (Al gorithm 7.1) for a bivariate AR(1) process on an unevenly spaced time grid.

7.1.5 Bootstrap confidence intervals

For processes with persistence, possibly of more complex form than AR(1), and distributional shapes more complex than Gaussian, that means, for the majority of climate processes, the classical CI for rXY is not exact but approximate. The accuracy of it is expected to depend on how strongly the properties of the process deviate from the assumed properties. This brings naturally the bootstrap into play. Two algorithms are analysed that resample data pairs, (x(i),y(i)), namely the pairwise-MBB and the pairwise-ARB algorithm. Both resampling types serve to construct bootstrap CIs.

7.1.5.1 Pairwise-moving block bootstrap

The pairwise-MBB (Algorithm 7.2), already introduced for regression (Section 4.1.7.1), extends the ability of the MBB to preserve persistence of a single process, X(i), over the length of a block, to the bivariate setting. Because also Y(i) can exhibit the memory phenomenon, expressed by the persistence time, ty, block length selection may be more difficult in the bivariate than in the univariate setting. Mudelsee (2003) suggested the block length selector lopt = 4 max (tx, ty). (7.30)

In practice, the (bias-corrected) persistence-time estimates ?X and 7y are plugged in. Although Monte Carlo experiments (Mudelsee 2003) revealed acceptable coverage performance of BCa CIs for rXY, it should be worth testing other block length selectors. This is also in line with the "optimal estimation" strategy (Section 6.2.7).

The second selector is, thus, based on combining the bias-corrected equivalent autororrelation coefficients, aX and aY, in a new expression, aXY

and employing the univariate selector (Eq. 3.28). This yields lopt = nint

Another technical measure is the z-transformation. First, a bootstrap CI is constructed for z and then the CI bounds are re-transformed to

Step 1

Bivariate time series

{i(i),x(i),y(i)}n=1

Step 2

Pearson's rXY (Eq. 7.5)

Step 3

Fisher's z-transformation

z = tanh-1 (txy )

Step 4

Estimated, bias-corrected

persistence time, process X(i),

using mean-detrended

time series, {x(i) — x}r=1

Step 5

Analogously, process Y (i)

Step 6

Select block length

l

Step 7

Apply MBB with I

{x*b(i)ir=i = {x(f (i))i:=i

(Algorithm 3.1) to x values

(b, counter)

Step 8

Overtake bootstrap index

f (i)

for resampled y values

{y*b(i)}Li = {y(f (i))i:=i

Step 9

Resample

{x*b(i), y*b(i)}!=i

Step 10

Bootstrap replications,

Pearson's rXY and Fisher's z

t*Xy , z*b = tanh_i (tXV)

Step 11

Go to Step 7 until b = B (usually B = 2000)

replications exist

{ztb}B=i

Step 12

Calculate CI (Section 3.4)

for Fisher's z

CIz,i-2a = [zi; zuj

Step 13

Re-transform lower and upper endpoints to obtain

pairwise-MBB CI for rXY

CIrXY,i-2a = |^tanh (zi);tanh(zu)j

Algorithm 7.2. Construction of bootstrap confidence intervals for Pearson's correlation coefficient, pairwise-MBB resampling. Step 8: By overtaking the random bootstrap index f(i) e {1,...,n} from x-resampling for y-resampling, (x(j),y(j)) pairs are resampled.

obtain a CI for txy. The idea (Hall et al. 1989) is to enhance CI construction by supplying replications (z*) that are in shape closer to a normal distribution than the alternative (rxy).

7.1.5.2 Pairwise-autoregressive bootstrap

The pairwise-ARB (Algorithm 7.3) resamples pairs (x(i), y(i)) by overtaking the random index from x-resampling for y-resampling. Also this algorithm employs the z-transformation.

The coverage performance of CIs from the pairwise-ARB and the pairwise-MBB algorithms are explored by means of Monte Carlo experiments (Section 7.3).

7.2 Spearman's rank correlation coefficient

Consider instead of process X(i) its rank, R(i) = rank[X(i)]. For example, if from all X(i), it is X(7) that has the lowest value, then R(7) = 1. Let analogously S(i) = rank[Y(i)]. The rank correlation coefficient is then given by where pR = E[R(i)] and ps = E[S(i)]. That means, the rank correlation coefficient between X and Y is equal to the correlation coefficient between the rank of X and the rank of Y. The rank correlation measures the degree of the monotone relationship between X and Y ; also ps is between —1 and 1.

Strictly speaking, Eq. (7.33) applies only to discrete random variables X(i) and Y(i) because continuous variables cannot be ranked (Gibbons and Chakraborti 2003). However, for continuous variables it is possible to define ps as the grade correlation coefficient (background material). This distinction between rank and grade correlation coefficient is of theoretical importance (Kruskal 1958) but of limited practical relevance in the context of this book.

Spearman's (1904, 1906) estimator of ps uses a bivariate sample as follows:

where R and S are the sample means, and Sn,R and Sn,S are the sample standard deviations calculated with the denominator n.

Assume the absence of ties in R(i) and S(i). This means a negligible loss of generality for continuous climate variables. The case of a binary variable, which is relevant for analysing the climate extremes component, ps

E [{R(i) — pr}-{S(i) — ps}] {VAR [R(i)] ■ VAR [S(i)]}1/2 '

Step

1

Bivariate time series

{i(i),x(i),y(i)}?=i

Step

2

Pearson's rXY (Eq. 7.5)

Step

3

Fisher's z-transformation

z = tanh-1 (txy )

Step

4

Estimated, bias-corrected

persistence time, process X(i),

SX

using mean-detrended

time series, {¿(i) — x}™-:l

Step

5

Analogously, process Y (i)

fY

Step

6

Climate equation

residuals, process X(i)

{qX (i)}n=l = {[x(i) — x] /sn,X }n=i

Step

7

Analogously, process Y(i)

{qY (i)}n=l = {[y(i) — y] /sn, Y }n=i

Step

8

Abbreviation, process X (i)

a'x (i) = exp{ — [t(i) — t(i — 1)]/fX },

i = 2,. .. , n

Step

9

Analogously, process Y(i)

S'y (i) = exp{ — [t(i) — t(i — 1)]/fY },

i = 2,. .. , n

Step

10

White-noise residuals,

ex (i) = [qx (i) — SX (i) • qx (i — 1)]

process X (i)

x {1 — [SX (i)]2 }-i/2,

i = 2,. .. , n

Step

11

Analogously, process Y(i)

«y (i) = [qY (i) — S'y (i) • qY (i — 1)]

x {1 — [S'y(i)]2}-i/2 ,

i = 2,. .. , n

Step

12

Centering, process X(i)

ex(i) = ex(i) — E"=2 ex(i)/(n — 1)

Step

13

Analogously, process Y(i)

ëY (i) = «Y (i) — En=2 «Y (i)/(n — 1)

Algorithm 7.3. Construction of bootstrap confidence intervals for Pearson's correlation coefficient, pairwise-ARB resampling. Steps 6 and 7 employ the sample standard deviations (sample level), sn,X = {X!n=i [x(i) — x]2 /n }i/2 and sn,Y = {X]"=i [y(i) — I/]2 /n}1/2, calculated with the denominator n.

Step 14

Draw ex (j), j = 2,...,n,

with replacement from

{ex (i)}?=2

(b, counter), process X(i)

Step 15

Bootstrap index, f (j),

f (j) e {1,...,n} ,j = 2,... ,n

process X (i), defined via

eX (j) = ex (f (j))

Step 16

Resampled climate equation

9x (1) drawn from {qx (i)}™=1,

residuals, process X(i)

+ {1 - [SX(i)]2}1/2 • eXb(i), i = 2,... ,n

Step 17

Bootstrap index, f (j = 1),

f(1) e{1,...,n}

process X (i), defined via

qXb(1) = qx (f (1))

Step 18

Overtake random bootstrap index,

process Y (i)

{eYb(i)}n=2 = {?Y (f (i))}?=2

Step 19

Resampled climate equation

qYb(1) = qY (f (1)),

residuals, process Y(i)

+ {1 - [SY(i)]2}1/2 • eYb(i), i = 2,... ,n

Step 20

Resampled data,

process X (i)

x*b(i) = x + Sn,x • qX(i), i = 1,. .. ,n

Step 21

Analogously, process Y(i)

y*b(i) = y + sn,Y • qYb(i), i = 1,... ,n

Step 22

Resample

{x * b(i), y * b(i)}n=i

Step 23

Bootstrap replications, Pearson's rXY and

Fisher's z

rXbY, z *b = tanh-1 (rXbY)

Algorithm 7.3. Construction of bootstrap confidence intervals for Pearson's correlation coefficient, pairwise-ARB resampling (continued). Step 18: By overtaking the random bootstrap index f(i) e {1,... ,n} from x-resampling for y-resampling, (x(j),y(j)) pairs are resampled.

Step 24

Go to Step 14 until b = B (usually B = 2000)

replications exist

Step 25

Calculate CI (Section 3.4)

for Fisher's z

CIz,l-2a = Zuj

Step 26

Re-transform lower and upper endpoints to obtain

pairwise-ARB CI for rXY

CIrXY ,i-2a = |tanh(zi );tanh(zu)j

Algorithm 7.3. Construction of bootstrap confidence intervals for Pearson's correlation coefficient, pairwise-ARB resampling (continued).

Algorithm 7.3. Construction of bootstrap confidence intervals for Pearson's correlation coefficient, pairwise-ARB resampling (continued).

is treated in Section 7.6. Since the set of values R(i) and S(i) can take, is known, n n

R = E R(i)/n = = E S(i)/n = (n + 1)/2 (7.35) i=1 i=1

This facilitates the computation of rs (Gibbons and Chakraborti 2003: Section 11.3 therein):

The estimator rs is called Spearman's rank correlation coefficient. Also rs is between -1 and 1.

7.2.1 Classical confidence intervals, non-persistent processes

Let X(i) and Y(i) both be a stochastic process without persistence. Consider the "null case" that the true rank correlation, ps, equals zero.

Then each combination of rank R(i) e {1,...,n} and rank S(i) G {1,..., n} has the same probability. The distribution of rs (Eq. 7.37) is independent of the distributions of X(i) and Y(i); hence rs is called a distribution-free statistic. The null distribution of rs can in theory be exactly deduced by means of combinatorics. In practice (finite computing power), this is done only if n is not larger than, say, 15, and approximations are used for larger data sizes (van de Wiel and Di Buc-chianico 2001; Gibbons and Chakraborti 2003). The null distribution of rs is essential for performing statistical tests of H0: "ps = 0."

Consider the "non-null case" that ps is not specified to be zero. This is of relevance within the context of this chapter, CI construction. In principle, the PDF of Spearman's correlation coefficient can be derived exactly via calculating the unequal probabilities of the (n!)2 pairs of R(i) and S(i), given a bivariate distribution of the ranks (Henze 1979). This method is for typical data sizes n in climatology not feasible. The practice employs therefore approximations.

Fieller et al. (1957) suggested use of the z-transformation: the quantity zs = tanh-1 (rs) (7.38)

approaches with increasing n normal distributional shape and has sezS - 1.03 (n - 3)-1/2 . (7.39)

The approximation for the expectation of zS is less accurate and makes the assumption of binormally distributed (X(i),Y(i)) with correlation coefficient pXY. Then (Fieller et al. 1957)

E [zs] - tanh-1 (fs) + fs VAR [rs] / (1 - f2)2 , (7.40)

where (Moran 1948) 6

fs = , , -n [sin-1 (pXY) + (n - 2) sin-1 (pxv/2)] (7.41) (n + 1) n and VAR [rs] is approximated by David and Mallows' (1961) formula, given in the technical issues (Eq. 7.65). On the sample level, plug in rXY for pXY. A classical CI for zs follows from using E [zs], sezs and a percentage point of the normal distribution; and a classical CI for rs follows from re-transforming the bounds of CIZs,1-2a. Note that the binormal assumption is not strong (Fieller et al. 1957) because the ranks, R(i) and S(i), are robust and apply to a wider class of bivariate distributions of X(i) and Y(i).

7.2.2 Classical confidence intervals, persistent processes

A realistic CI for rS should take persistence into account. The construction algorithm for a bivariate AR(1) process on an unevenly spaced time grid (Algorithm 7.4) is analogous to the case of Pearson's rXY. It employs the effective data size. Also CIrs,1-2Q, is approximate.

Step 1 Step 2 Step 3 Step 4

Step 5 Step 6

Spearman's rs (Eq. 7.37) Fisher's z-transformation zs = tanh-1 (rs) Perform Steps 4-8 of the construction algorithm of classical normal CI for rXY (Algorithm 7.1) Pearson's rXY (Eq. 7.5) Plug in rXY for pXY and

Step 7 Plug in rXY for pXY and n'p for n in Eq. (7.65) Step 8 Plug in rs and VAR [rs]

in Eq. (7.40) Step 9 Approximate, classical normal CI for rs, obtained from re-transforming zs

tanh [£[zs] + z(a) • 1.03 (n'p - 3) 1/2] tanh [£[zs] - z(a) • 1.03 (n'p - 3)—1/2]

Algorithm 7.4. Construction of classical confidence intervals for Spearman's rank correlation coefficient, bivariate AR(1) models. Steps 3 and 9: zs is Fisher's transformed rank correlation, z(a) is the percentage point of the normal distribution.

7.2.3 Bootstrap confidence intervals

The motivation for considering bootstrap CIs for rs is similar to the case of rXY: possibly more complex persistence forms than AR(1) and distributional shapes than bivariate Gaussian. The robustness of ranking methods with respect to violations of the distributional assumption (Kendall and Gibbons 1990; Gibbons and Chakraborti 2003) may weaken the motivation. On the other hand, the inaccuracy of the formula for the expectation of zs (Eq. 7.40) strengthens it. The two bootstrap algorithms, pairwise-MBB and pairwise-ARB, are for Spearman's rs completely analogous to the case of Pearson's rXY.

7.2.3.1 Pairwise-moving block bootstrap

The pairwise-MBB algorithm for constructing a bootstrap CIrs,1-2Q, is displayed in shortened form (Algorithm 7.5). Also for Spearman's rank correlation we apply Fisher's z-transformation of rs to bring the distribution of the replications (zg) in shape closer to a normal distribution. Also the two block length selectors (Eqs. 7.30 and 7.32) are overtaken.

Step 1

Bivariate time series

{i(i),x(i),»(i)>n=i

Step 2

Spearman's rs (Eq. 7.37)

Step 3

Fisher's z-transformation

zs = tanh-1 (rs)

Step 4

Estimated, bias-corrected

persistence times,

block length selection

l

Step 5

Resample, pairwise-MBB with I

{x*b(i),y*h(i)yn_i (b, counter)

Step 6

Bootstrap replications

rSb, zSh = tanh-1 (rSh)

Step 7

Go to Step 5 until b = B (usually B = 2000)

replications exist

KlBLi

Step 8

Calculate CI for Fisher's zs

CIZS ,1-2a = [zs,i; Zs,uj

Step 9

Re-transformation

CIrS,i-2a = [tanh (zs,i) ; tanh (zs,u)]

Algorithm 7.5. Construction of bootstrap confidence intervals for Spearman's rank correlation coefficient, pairwise-MBB resampling (cf. Algorithm 7.2).

Algorithm 7.5. Construction of bootstrap confidence intervals for Spearman's rank correlation coefficient, pairwise-MBB resampling (cf. Algorithm 7.2).

7.2.3.2 Pairwise-autoregressive bootstrap

The pairwise-ARB algorithm for constructing a bootstrap CIrs,1-2Q, is displayed in shortened form (Algorithm 7.6). Also here we apply Fisher's z-transformation of rs.

Step 1

Bivariate time series

{i(i),x(i),»(i)}n=i

Step 2

Spearman's rs (Eq. 7.37)

Step 3

Fisher's z-transformation

zs = tanh-1 (rs)

Step 4

Estimated, bias-corrected

persistence times,

?X

climate equation

residuals,

{qx (i)}n=i, {qY (i)}n=i

abbreviations,

«X (i) «Y (i)

white-noise residuals,

ex(i), ey(i)

centering

ex (i),ey (i)

Steps 14-22 therein); b, counter

{x*b(i),y*b(i)}n=i

Step 6

Bootstrap replications

rSb, zSb = tanh-1 (rSb)

Step 7

Go to Step 5 until b = B (usually B = 2000)

replications exist

{zsb}B=i

Step 8

Calculate CI for Fisher's zs

CIzS ,i-2a = [zs,i; Zs,uj

Step 9

Re-transformation

CIrS,i-2a = [tanh (zs,i); tanh (zs,u)]

Algorithm 7.6. Construction of bootstrap confidence intervals for Spearman's rank correlation coefficient, pairwise-ARB resampling (cf. Algorithm 7.3).

7.3 Monte Carlo experiments

The performance of CIs for rXY and rS was analysed by means of Monte Carlo simulations. The experiments focused on identifying CI

Table 7.1. Monte Carlo experiment, Spearman's correlation coefficient with Fisher's z-transformation for bivariate lognormal AR(1) processes. nsim = 47,500 random samples were generated from the binormal AR(1) process, {X(i),Y(i)}™=1, after Eqs. (7.53) and (7.54) with tx = 1,ry = 2 and pE given by Table 7.8. The lognormal shape was generated by taking exp [X (i)] and exp [Y (i)]. The start was set to t(1) = 1; the time spacing, d(i), was drawn from a gamma distribution (Eq. 2.48) with order parameter 16, that means, a distribution with a coefficient of variation equal to (16)-1/2 = 0.25, and subsequently scaled to d = 1. Two CI types for pS were constructed, classical and bootstrap. The classical CI (Algorithm 7.4) used the effective data size, n'p. The bootstrap CI (Algorithm 7.5) used pairwise-MBB resampling, block length selection after Eqs. (7.31) and (7.32), Student's t (v = 2n — 5) and BCa interval types and B = 2000. Confidence level is 95%.

n Yag Nominal

True rank correlation, pS 0.3 0.8

CI type CI type

Bootstrap

Classical

Bootstrap

Classical

Student's t

BCa

Student's t

BCa

10

0.824

0.671

0.693

0.946

0.842

0.833

0.950

20

0.867

0.800

0.764

0.977

0.900

0.842

0.950

50

0.912

0.893

0.858

0.944

0.929

0.729

0.950

100

0.957

0.922

0.823

0.930

0.920

0.645

0.950

200

0.964

0.928

0.718

0.935

0.930

0.471

0.950

500

0.964

0.927

0.608

0.946

0.945

0.228

0.950

1000

0.943

0.942

0.293

0.943

0.941

0.098

0.950

a Standard error of jrg is nominally 0.001.

a Standard error of jrg is nominally 0.001.

types, resampling schemes and correlation measures that perform well, in terms of coverage accuracy, in situations typical for climate time series, namely in the presence of

■ non-Gaussian distributional shapes and

■ nonzero, possibly different persistence times of processes X(i) and

The two major findings are the following.

Table 7.2. Monte Carlo experiment, Spearman's correlation coefficient with Fisher's z-transformation for bivariate lognormal AR(1) processes: influence of block length selection. The number of Monte Carlo simulations, the properties of {T(i),X(i),Y(i)}?=i and the construction of bootstrap CIs are identical to those in the first experiment (Table 7.1), with the exception that here block length is selected after Eq. (7.30) instead of Eqs. (7.31) and (7.32).

n YaS Nominal

True rank correlation, pS

Bootstrap CI type Bootstrap CI type

Bootstrap CI type Bootstrap CI type

Student's t

BCa

Student's t

BCa

10

0.599

0.380

0.756

0.559

0.950

20

0.723

0.592

0.837

0.712

0.950

50

0.894

0.849

0.929

0.904

0.950

100

0.954

0.915

0.924

0.909

0.950

200

0.964

0.927

0.934

0.928

0.950

500

0.963

0.926

0.947

0.946

0.950

1000

0.940

0.939

0.943

0.942

0.950

a Standard error of Yrs is nominally 0.001.

a Standard error of Yrs is nominally 0.001.

1. Spearman's rank correlation coefficient performed clearly better than Pearson's correlation coefficient. The latter's use is advisable only in situations where the Gaussian assumption is likely to be fulfilled or where computing power allows to calibrate the CIs.

2. Classical CIs failed completely in the presence of non-Gaussian distributions.

Spearman's rs in combination with pairwise-MBB resampling produced acceptable coverage accuracies (deviations from the nominal level of less than, say, five percentage points) for the bivariate lognormal AR(1) process with unequal, nonzero persistence times (Table 7.1, p. 303). It seems that data size requirements for achieving such coverages are slightly less demanding for Student's t CIs (n ^ 100) than for BCa CIs (n ^ 200).

The choice of the block length selector had minor influence. This is seen by comparing results from using the selector after Eqs. (7.31) and (7.32) in Table 7.1, with those from using Eq. (7.30) in Table 7.2.

Table 7.3. Monte Carlo experiment, Spearman's correlation coefficient without Fisher's z-transformation for bivariate lognormal AR(1) processes. The number of Monte Carlo simulations and the properties of {T(i),X(i),Y(i)}"=1 are identical to those in the first experiment (Table 7.1). Also the construction of bootstrap CIs used pairwise-MBB resampling, block length selection after Eqs. (7.31) and (7.32), Student's t and BCa interval types, B = 2000 and a confidence level of 95%. The difference is that here no Fisher's z-transformation and no re-transformation (Algorithm 7.5, Steps 3 and 9, respectively) are performed and the bootstrap replications consist not of zgb, but of rgb.

n YaS Nominal

True rank correlation, pS 0.3 0.8

Bootstrap CI type Bootstrap CI type

Student's t

BCa

Student's t

BCa

10

0.831

0.821

0.905

0.840

0.950

20

0.877

0.891

0.938

0.900

0.950

50

0.893

0.904

0.942

0.929

0.950

100

0.916

0.921

0.935

0.919

0.950

200

0.926

0.928

0.940

0.930

0.950

500

0.926

0.927

0.942

0.945

0.950

1000

0.941

0.942

0.946

0.941

0.950

a Standard error of Yrg is nominally 0.001.

a Standard error of Yrg is nominally 0.001.

Fisher's z-transformation of rs had been advocated by Fieller et al. (1957). However, it seems not to have a major effect on bootstrap CI coverage accuracies. This is seen by comparing Tables 7.1 and 7.3. An early simulation study (Kraemer 1974) with data sizes n = 10 and 20 already concluded that using the z-transformation in combination with the normal approximation of the distribution of zs is less accurate than using instead Student's t distribution.

Pearson's rXY did not produce acceptable coverage accuracies for bivariate lognormal processes—neither with uncalibrated classical CIs nor with uncalibrated CIs from pairwise-MBB resampling (Table 7.4). The reason is likely that the distribution of rXY is, despite the z-transformation, skewed itself when the distributions of the input data, X(i) and Y(i), are skewed. The second experiment with rXY, in which {X(i), Y(i)} were binormally distributed (Table 7.5), exhibited reason-

Table 7.4. Monte Carlo experiment, Pearson's correlation coefficient with Fisher's z-transformation for bivariate lognormal AR(1) processes. The number of Monte Carlo simulations and the properties of {T(i),X(i),Y(i)}"=1 are identical to those in the first experiment (Table 7.1), with the exception that pE is here given via Eq. (7.24). The construction of CIs followed Algorithms 7.1 and 7.2.

n YaXY Nominal

True correlation, pxY

n YaXY Nominal

True correlation, pxY

CI type

CI type

Bootstrap

Classical

Bootstrap

Classical

Student's t

BCa

Student's t

BCa

10

0.820

0.701

0.748

0.864

0.778

0.875

0.950

20

0.876

0.808

0.805

0.904

0.859

0.807

0.950

50

0.932

0.875

0.848

0.898

0.864

0.737

0.950

100

0.939

0.866

0.836

0.895

0.856

0.684

0.950

200

0.941

0.879

0.781

0.897

0.853

0.633

0.950

500

0.907

0.876

0.767

0.899

0.846

0.554

0.950

1000

0.911

0.885

0.730

0.913

0.866

0.551

0.950

a Standard error of yrxY is nominally 0.001.

a Standard error of yrxY is nominally 0.001.

ably good coverage accuracies for n ^ 100. In the second experiment, bootstrap Student's t CIs for rXY performed slightly better than bootstrap BCa (too low coverage) or classical CIs (too high coverage).

The dependence of the coverage results on the true values, pXY or ps, which were prescribed as 0.3 and 0.8, seems weak.

The better performance of rs in comparison with txy is rooted in its robustness, which in turn stems from the use of the ranks of the values instead of the values themselves (Fieller et al. 1957). The Monte Carlo simulations reveal that the robustness influences positively also the property of coverage accuracy.

Pairwise-ARB resampling in combination with Student's t CIs performed less good than pairwise-MBB resampling (results not shown), but the coverage error (some percentage points) may be acceptable in climate sciences. Pairwise-ARB resampling in combination with BCa CIs, on the other hand, produced clearly too large coverage errors.

Table 7.5. Monte Carlo experiment, Pearson's correlation coefficient with Fisher's z-transformation for binormal AR(1) processes. The number of Monte Carlo simulations, the properties of {T(i)}™=1 and the construction of CIs (Algorithm 7.2) are identical to those in the previous rXY experiment (Table 7.4). The process {X(i), Y(i)}™=1 is binormal AR(1) after Eqs. (7.53) and (7.54) with tx = 1,ty =2 and pE given by Eq. (7.57).

n YaXY Nominal

True correlation, pXY

CI type

CI type

Bootstrap

Classical

Bootstrap

Classical

Student's t

BCa

Student's t

BCa

10

0.701

0.566

0.752

0.840

0.735

0.973

0.950

20

0.721

0.629

0.812

0.904

0.840

0.974

0.950

50

0.890

0.861

0.931

0.886

0.859

0.974

0.950

100

0.934

0.893

0.959

0.910

0.900

0.972

0.950

200

0.961

0.922

0.953

0.930

0.925

0.968

0.950

500

0.949

0.932

0.953

0.935

0.931

0.968

0.950

1000

0.937

0.934

0.953

0.942

0.941

0.968

0.950

a Standard error of YrxY 's nominally 0.001.

a Standard error of YrxY 's nominally 0.001.

Calibrating the CI increased the coverage performance dramatically (Table 7.6), especially for the problematic case of Pearson's rXY and bi-variate lognormal AR(1) processes. Those accurate results demonstrate that CI lengths (Table 7.7) of correlation estimates are rather large if

■ sample sizes are small,

■ persistence exists and

■ true correlation coefficients are small in size.

Table 7.6. Monte Carlo experiment, Pearson's and Spearman's correlation coefficients with Fisher's z-transformation for bivariate lognormal AR(1) processes: calibrated CI coverage performance. The number of Monte Carlo simulations and the properties of {T(i),X(i),Y(i)}™=1 are identical to those in the first experiment (Table 7.1), with pE given by Eq. (7.24) and Table 7.8, respectively. Calibrated Student's t CIs were constructed after Eq. (3.47) using two loops of pairwise-MBB resampling with block length selection after Eqs. (7.31) and (7.32). The first loop (bootstrap of samples) used B = 2000 resamplings, the second loop (bootstrap of resamples) used 1000 resamplings. In the second loop, the block length was not re-estimated but overtaken from the first loop. The spacing of the A values for the calibration (Eq. 3.45) is 0.001.

n

YrXY

YrS

Nominal

True correlation, pXY

True rank correlation, pS

0.3

0.8

0.3

0.8

10

0.917

0.836

0.912

0.959

0.950

20

0.959

0.937

0.960

0.939

0.950

50

0.964

0.947

0.944

0.929

0.950

100

0.969

0.947

0.951

0.945

0.950

200

0.966

0.946

0.970

0.948

0.950

a Standard errors of YrxY and Yrs are nominally 0.001.

a Standard errors of YrxY and Yrs are nominally 0.001.

Table 7.7. Monte Carlo experiment, Pearson's and Spearman's correlation coefficients with Fisher's z-transformation for bivariate lognormal AR(1) processes: average calibrated CI length. The number of Monte Carlo simulations is nsim — 47,500. See Table 7.6 for further details.

True correlation, pxy True rank correlation, pS

0.3

0.8

0.3

0.8

10

1.691

1.028

1.743

1.744

20

1.713

0.506

1.789

0.834

50

1.593

0.291

1.709

0.263

100

1.203

0.229

1.367

0.167

200

0.929

0.184

0.890

0.135

a Average value over nsim simulations.

a Average value over nsim simulations.

7.4 Example: Elbe runoff variations

Dresden is a station on the river Elbe, from which long measured runoff records are available (Fig. 6.3). We study how strongly the random component in Dresden runoff variations correlate with variations in records from other stations on the river, namely Decin (70 km upstream) and Barby (240 km downstream). The raw data are shown in Fig. 7.1.

5000-1

1

4000-

CO

3000-

3

2000-

a

1000-

0-

5000-1

4000-

w

3000-

CO

2000-

a

1000-

0-

5000-

4000-

CO

3000-

3

2000-

a

1000-

1900

1950 Year

Figure 7.1. Elbe runoff 1899-1990, time series. The raw data are from stations (a) Decin (Czech Republic), (b) Dresden and (c) Barby (both Germany); they cover the interval from 1 November 1899 to 31 October 1990 in daily resolution, without missing values. (The Czech name of the river is Labe. Data from Global Runoff Data Centre, Koblenz, Germany.)

1900

1950 Year

2000

Figure 7.1. Elbe runoff 1899-1990, time series. The raw data are from stations (a) Decin (Czech Republic), (b) Dresden and (c) Barby (both Germany); they cover the interval from 1 November 1899 to 31 October 1990 in daily resolution, without missing values. (The Czech name of the river is Labe. Data from Global Runoff Data Centre, Koblenz, Germany.)

To extract the random component, we remove the annual cycle from the raw time series. To correct for effects of nonzero travel times of the water (Engel et al. 2002), we bin the daily series into monthly mean records. The resulting data size (n = 1092) is large enough to let us expect a high accuracy of the CIs for the correlation measures.

i iO CO

150010005000

1000-

500-

-500 0 500 1000 1500 AQpresden (m3 S-1)

1000-

500-

Figure 7.2. Elbe runoff 1899-1990, correlations. a Decin versus Dresden; b Barby versus Dresden. Prior to correlation estimation, (1) the annual cycles were removed from the raw data (Fig. 7.1) by subtracting the day-wise long-term averages and (2) monthly resolved records (denoted as AQ) constructed by binning. Each record has a sample size of n = 1092. (See text for rXY and rs values.)

The resulting correlation values with 95% CI are as follows. Decin versus Dresden (Fig. 7.2a), rXY = 0.995 [0.993; 0.997] and rs = 0.991 [0.986; 0.995]; Barby versus Dresden (Fig. 7.2b), rxY = 0.964 [0.954; 0.972] and rs = 0.955 [0.942; 0.965]. These are calibrated Student's t CIs, obtained using Fisher's transformation, pairwise-MBB resampling (first loop, B = 2000 resamplings; second loop, 1000 resamplings) and a A-spacing of 0.001. The selected block lengths (first loop, overtaken for second loop) after Eqs. (7.31) and (7.32) are (Fig. 7.2a) l = 13 and (Fig. 7.2b) l = 14.

The significantly higher correlation values of Decin-Dresden compared with Barby-Dresden bivariate runoff variations can in terms of hydrology be interpreted to reflect the growing catchment area of the river Elbe. While the increase between Decin and Dresden is moderate (from 51,104 to 53,096 km2), it is larger when going further downstream, to Barby (94,060 km2). That means, between Dresden and Barby clearly more "random innovations" in the form of confluencing tributaries "disturb" the water flow than between Decin and Dresden. This examples demonstrates also the superiority of CIs over mere correlation testing.

7.5 Unequal timescales

Consider the situation that the set of time points for the X values is not identical to the set of time points for the Y values. This is ubiquitous in paleoclimatology, where we study the relation between variations of one variable, measured on one dated climate archive, and a second variable, from a second archive that is independent of the first archive. The challenge imposed by these unequal timescales roots in the fact that Pearson's or Spearman's recipes for estimating the degree of the relation between the fluctuations of both variables cannot be readily applied.

The method of adaption to the case of unequal timescales that is conventionally used in climatology, is to interpolate both time series to a common time grid and then apply the usual estimation procedure of rXY or rs. (We denote these interpolation correlation estimators as rXY and rs.) One danger with that method is that the freedom of how to select the time grid translates into an arbitrariness regarding the size of the interpolated sample and, in turn, a risk of a biased error determination. The other danger stems from the serial dependences caused by the interpolations, which have to be taken correctly into account.

The adaption method developed in statistical science (Hocking and Smith 1968) focuses on missing observations, which is a special case of unequal timescales. There, a number of common time points exist, which allows inference of the covariance because information on the "mixing" exists. The other points, for which data exist from either X or Y, but not from both, are used for inference of the means and standard deviations. This research, summarized by Kotz et al. (2000: pp. 298-305 therein), is, however, of limited relevance for climatological applications.

1. If the unequal timescales do not at all have common time points, which may occur with paleoclimate samples, the correlation estimation is prohibited because no "mixing information" can be used.

2. Instationarities of the first or second moment may bias the estimation. In particular, heteroscedasticity may lead to underestimated standard deviations and, hence, to absolute correlation values greater than unity (Kotz et al. 2000).

3. The assumptions made in the statistical literature, namely multivari-ate normal distributional shape and serial independence, are typically violated in climatological applications. The properties of the suggested estimators seem not to be known for such a more realistic setting.

This book suggests therefore a seemingly novel estimation approach for climate data samples, which is denoted as binned correlation. It rests on the concept that the nonzero persistence times (Chapter 2), seemingly a genuine property of climate time series, allow to recover the "mixing information" even when the two timescales differ. The condition is that the time spacing is not much larger than the persistence times. Then enough common data points fall within a time bin and knowledge can be acquired on the covariance. We give a second estimation procedure, denoted as synchrony correlation, based on selecting only those X-Y pairs that consist of values close to each other in time.

7.5.1 Binned correlation

To consider the binned correlation (Fig. 7.3), let the two time series have sample size nX and nY, and let the data be given on the process level as

respectively. Let the persistence time be denoted as tx and ty, respectively. The first step is to divide the time interval from the leftmost of the time points (denoted as Tmin) to the rightmost of the time points (denoted as Tmax) into bins of a constant length, f. Three selection rules for f are given in the following paragraph. If (Tmax — Tmin)/f has a remainder, let the rightmost bin be smaller than f. The second step is to evaluate whether a time bin contains both more than zero X points and more than zero Y points. For example, in Fig. 7.3 the first bin, [Tmin; Tmin + f], contains three X points and one Y point. If the evaluation is positive, then form the average (denoted as X(k)) of the X points within the bin and the average (denoted as Y(k)) of the Y points within the bin, and assign as time value (denoted as T(k)) the central time for the bin. In Fig. 7.3, T(1) = Tmin + (1/2) ■ f and t(2) = Tmin + (5/2) ■ f (there are no Y points contained in the second time bin). The resulting time series is

This binned sample has size n. The binned Pearson's correlation coefficient, denoted as fXY, is calculated as rXY on the binned time series (Eq. 7.43).

The bin width f is selected such that it is permissible to compare X and Y values within the same bin. This means that the selection takes into account the persistence times of both processes. Simple rules are f = tx + ty (7.44)

Figure 7.3. Binning for correlation estimation in the presence of unequal timescales.

Monte Carlo experiments, similar to those presented in Section 7.5.3, show the superiority (in terms of RMSE^^) of a third rule, based on the average spacings, dx = [Tx(nx) - Tx(1)]/(nx - 1) , dy = [Ty (ny) - Ty(1)]/(ny - 1) , (7.46)

dxY = [TTmax - Tmi^ /(nx + ny - 1) , the bias-corrected equivalent autocorrelation coefficients, exp (-dx/ix) , exp (-dy /tY ) , (7.47)

Selection of r determines the sample size, n, of the binned series and the statistical properties of rxy. In the case of unequal timescales, the existence of (climate-induced) persistence may have a beneficial effect. If no persistence would be in the (climate) system, and no common time points of X and Y exist, then n would be equal to zero and no information on the correlation between X and Y could be recovered.

7.5.2 Synchrony correlation

The synchrony correlation estimation (Algorithm 7.7) starts with selecting the pair (X(imin),Y(jmin)), for which the absolute time difference, |TX(imin) - Ty (jmin)|, is minimal. The algorithm takes only a percentage, P ■ 100%, of the maximum possible number of X-Y pairs; this maximum number equals min(nx,ny), and the pairs have the smallest absolute time differences. The correlation estimation is then made by using those n^ "synchrony pairs" and calculating Pearson's or Spearman's correlation coefficient. We denote the synchrony Pearson's correlation coefficient as r:xy^10o%.

The synchrony correlation estimation avoids making a step away from the original data as interpolation does. The synchrony procedure avoids also the averaging step as the binned correlation procedure (Section 7.5.1) does. The statistical properties of ^xy^100% as an estimator of pxy are therefore potentially better (e.g., smaller RMSE^XY) than those of rxy or rxy; analogous expectations can be raised for Spearman's correlation coefficient.

tx ay txy

Step 1

Processes

{Tx (i),X (i)}?*, {Ty (j),Y (jj

Step 2

Initialize counter of "synchrony pairs"

CXY = 1

Step 3

Prescribe number of synchrony pairs

nk = NINT • min (nx, «Y)]

Step 4

Absolute time differences

AT(i,j) = |Tx (i) - Ty(j)|

Step 5

Determine combination,

(imin , jmin) ,

for which

AT(imin , jmin)

is minimal

to the set of synchrony pairs,

(X (imin ),Y (jmin ))

remove points

(Tx (imin) , X (imin))

0 0

Post a comment