H

is the autocorrelation estimator and n

R(h) = ^ Xnoise(i) ■ Xnoise(i - h), h = 0, 1, 2, (2.18) Figure 2.4. Regions of asymptotic stationarity for the AR(2) process (Eq. 2.14) (shaded). The region for complex roots (dark shaded), which allows quasi-periodic behaviour, lies below the parabolic, a2 < —a^/4. Figure 2.5. Realization of an AR(2) process (Eq. 2.14); n = 200, ai = 1.0, a2 = —0.4 and E(i) = EN(0, 1)(i). The first 5000 data were discarded to approach asymptotic sta-tionarity. The graph exhibits a quasi-cyclical behaviour with an approximate period of 9.5 time units.

Figure 2.5. Realization of an AR(2) process (Eq. 2.14); n = 200, ai = 1.0, a2 = —0.4 and E(i) = EN(0, 1)(i). The first 5000 data were discarded to approach asymptotic sta-tionarity. The graph exhibits a quasi-cyclical behaviour with an approximate period of 9.5 time units.

is the autocovariance estimator. Estimation bias occurs also in case of the AR(2) model, approximations were given by Tj0stheim and Paulsen (1983).

2.3 Mixed autoregressive moving average model

We assume even spacing and write the discrete-time Gaussian mixed autoregressive moving average or ARMA(p, q) noise model,

Xnoise(i) = ' Xnoise(i - 1) +-----+ dp ' Xnoise(i - p)

+ bo ■ En(o, a2) (i) + + bq ■ En(o, a2) (i - q), (2.19) i = max(p, q) + 1,..., n.

Note that the model is given only for a subset of i = 1,... ,n. Similar conditions as in the preceding sections may be formulated for {a1,..., ap; b0,..., bq} to ensure stationarity. The AR(1) model (Section 2.1), AR(2) model (Section 2.2), the autoregressive model of general order (AR(p))— these are special cases of the ARMA(p, q) model (q = 0). The moving average process of general order (MA(q)) arises from the ARMA(p, q) model with p = 0; it is not considered further in this book. Estimation techniques for the ARMA(p, q) model are mentioned in the background material (Section 2.6).

As regards the context of this book, the problem with the discrete-time ARMA(p, q) model under uneven time spacing is that no embedding in a continuous-time process can be proven. Indeed, already for a discrete-time, real-valued Gaussian AR(1) process with a < 0 it was shown (Chan and Tong 1987) that no embedding in a continuous-time, real-valued Gaussian AR(1) process exists. No embedding of Xnoise(i) means no foundation on physics. Suppose, for example, that physical laws governing the climate system to be analysed yield an ARMA(p1, q1) continuous-time noise model. Even with a "perfect" estimation (estimation bias and variance both zero), it would then not be possible to determine the model parameters {a1,..., api; b0,..., bqi} uniquely from an unevenly spaced sample time series {t(i), xnoise(i)}™=1. For climate time series, a perfect time estimation would also be required, which is not usually possible. A further complication arises from "model aliasing." Bartlett (1946) showed that an evenly sampled continuous-time AR(p) process becomes a discrete-time ARMA(p,p - 1) process (the "alias"), which has implications already for the AR(2) model (Section 2.2). However, for a certain type of uneven spacing, namely "missing observations" (Fig. 1.15e), where {i(i)}™=1 is a subset of {i(j)}m=1 with d(j) = const. and m - n ^ m, the embedding problem vanishes and estimation techniques exist (Section 2.6).

The majority of sampled climate time series, at least within this book, exhibits uneven, irregular spacing (Fig. 1.15), for which only the simple AR(1) model ensures the embedding property. Fortunately, this is no serious limitation as climatic theory shows that climatic noise is to a first order of approximation well described by the AR(1) process (Section 2.5).

2.4 Other models

The discrete-time Gaussian ARMA(p, q) process (Eq. 2.19) composes Xnoise(i) as a linear combination of past Xnoise(j), j < i, and innovations, EN(0,CT2)(j). We briefly review other processes that can be seen as extensions of the ARMA(p, q) process. These processes might provide somewhat more realistic models for Xnoise(i). However, usage of many of these models seems to be restricted to evenly spaced time series (perhaps with missing values) because of the embedding problem (Section 2.1.2.1) and lack of statistical theory.

2.4.1 Long-memory processes

The AR(1) process has an exponentially ("fast") decaying autocorrelation function (Fig. 2.2). Also the ARMA(p, q) process has a similar bound (Brockwell and Davis 1991), where C > 0 and 0 < r < 1. A long-memory process is a stationary process (loosely speaking, with time-constant statistical properties such as mean and standard deviation) for which where C = 0 and H < 0.5. This decrease is slower than in the case of ARMA(p, q), hence it is said to exhibit long-range serial dependence or long memory.

Examples of long-memory processes are

1. fractional Gaussian noise and

2. fractional autoregressive integrated moving average models, denoted as ARFIMA(p, 5, q).

The relation between the ARFIMA(p, 5, q) and ARMA(p, q) models is as follows. 5 defines (Section 2.6) a fractional difference operator, (1 — B), where |5| < 0.5 and B is the backshift operator. The backshift operator shifts one step back in time, for example, BXnoise(i) = Xnoise(i — 1). The ARFIMA(p, 5, q) model is then an ARMA(p, q) model (Eq. 2.19), where Xnoise(j) is replaced by (1 — B) Xnoise(j). For the trivial case 5 = 0, the ARFIMA(p, 5, q) model reduces to the ARMA(p, q) model, which already shows the embedding problem (Section 2.3). (One can

define a nonstationary ARIMA model by allowing 5 = 1, 2, ) The ARFIMA(p, 5, q) model has H = 5 (Brockwell and Davis 1991). Although continuous-time ARFIMA(p, 5, q) models have been developed (Comte and Renault 1996), the embedding for 5 = 0 seems not yet to have been analysed.

In the special case of the ARFIMA(0, 5, 0) model it has been shown (Hwang 2000) that for uneven spacing the estimation of 5 is biased, with the bias depending on the spacing. It appears that in the general case the theory of long-memory processes for unevenly spaced data is not well developed enough to be applied in the context of the present book. Section 2.6 gives more details on estimation of long-memory models for evenly spaced data, while Sections 2.5.2 and 2.5.3 present examples where long- and short-range models are fitted to climate series.

2.4.2 Nonlinear and non-Gaussian models

Stationary nonlinear models allow a richer structure to be given to the noise process, Xnoise(i). Of particular interest for climatology is the class of threshold autoregressive models (Tong and Lim 1980). Let the real line R be partitioned into l non-overlapping, closed segments, R = Ri U R2 U ■ ■ ■ U Rj. The discrete-time Gaussian self-exciting threshold autoregressive process of order (l; k,..., k) or SETAR(l; k,..., k) process, where k is repeated l times, is given by k

Xnoise(i) = a0m) + Y^ ajm) ■ Xnoise(i - j) + EN(0, ^(m)) (i), (2.22) j=1

conditional on Xnoise(i — j) £ Rm; m = 1, 2,..., l. As an example, Fig. 2.6 shows a realization of the SETAR(2; 1,1) process,

X { +2.0 + 0.8Xnoise(i — 1) + En(0, 1) (i) if Xnoise(i — 1) < 0, noise(i) \ —1.0 + 0.4Xnoise(i — 1) + En(0, 2)(i) if Xnoise(i — 1) > 0,

which may be a model of random fluctuations between two climate regimes with different mean values and persistence times. Also quasi-cyclical behaviour can be reproduced by threshold models. In practical applications the number of regimes, l, is usually limited to a few. Estimation is carried out iteratively: guess of l, maximum likelihood estimation of parameters, calculation of a goodness-of-fit measure such as AIC or a normalized version (Tong and Yeung 1991), analysis of residuals and autocorrelation functions, improved guess of l, etc. Continuous-time threshold autoregressive models have been formulated (Tong and Yeung 1991) but it seems that the embedding problem for unevenly spaced Figure 2.6. Realization of a SETAR(2; 1,1) process (Eq. 2.23); n = 200. (The first 5000 data were discarded.)

time series has not yet been analysed. This would mainly concern the SETAR(2; 1,1) case.

Many more types of nonlinear persistence models can be perceived (Section 2.6). It may for some climate data even be useful to consider in Eq. (1.2) the process S(i) ■ Xnoise(i) as belonging to the class of stochastic volatility models, for which S(i) depends on past Xnoise(i — j), j > 0, and/or past S (i—j). This process of time-varying variability could model "burst" phenomena such as earthquakes and serve also as a formulation of the outlier process in Eq. (1.2). One common problem, however, with complex, nonlinear time series models, is the embedding of the discrete-time process in continuous time (Section 2.1.2.1). We have to concede that complex, unevenly observed climatic processes may not be accessible to a meaningful parametric estimation.

Also non-Gaussian random innovation terms can be used to construct ARMA(p, q) models. In such cases, however, formulas for the estimation bias are hardly available. One possibility is to introduce a transformation,

where Xoise(i) is a Gaussian process, and to infer f from a probability density estimation (Section 1.6) using {xnoise(i)}™=1.

2.5 Climate theory

A dynamical view of the climate system gives motivation that climatic persistence may to a first order of approximation be written as an AR(1) process. This was shown by an influential paper entitled "Stochastic climate models" (Hasselmann 1976).

The dynamical view seems to be challenged by a series of papers claiming a "universal power law" in temperature records. This law indicates long memory, not short as the AR(1) models suggests. Long memory of temperature fluctuations over timescales from days to decades should seriously impact the development of climate theory. It would further have enormous practical consequences as weather forecasts for intervals considerably longer than what is currently feasible (a few days) would become principally possible.

The re-analysis of some crucial data here (Section 2.5.2) makes it hard to accept the "universal power law." It should nevertheless be kept in mind that the AR(1) model need not be a good higher-order description of X(T). However, after allowing nonlinear trends and outlier processes (Eq. 1.2), the AR(1) model is likely not a bad candidate to describe

2.5.1 Stochastic climate models

The derivation of the AR(1) model of climate persistence is based on three assumptions.

Assumption 1 is timescale separability. The climate system as a whole (p. 3) is composed of a slowly varying component ("climate" in original sense), representing oceans, biosphere and cryosphere, and a fast varying component ("weather"), representing the atmosphere.

The differential equations governing the climate evolution may then be written as where tx » ty , X and Y are the slowly and fast varying components (vectors of possibly high dimension), respectively, and F and G are some nonlinear system operators. ty is of the order of a few days, tx of several months to years and more (Hasselmann 1976).

It was previously thought that the influence of Y on X, of weather on climate, could be accounted for by simply averaging, dX (T)

where the modified climate system operator, F*, is the time average of F(X, Y).

Since the work of Hasselmann (1976) it is accepted that the weather noise cannot so easily be cancelled out. Consider 0 < T < tx. Then dX(T) ~ F(X(0) Y(T)) dT ~ F (X (0),Y (T )), (2.28)

= W (T), where W is a stochastic (Wiener) process. Discretization yields

Assumption 2. The unknown weather components Y(T) add up to yield after the central limit theorem (Priestley 1981: Section 2.14 therein) a Gaussian purely random process EN(0, .2)(T).

Now let T > tx. Then the time-dependence of F(X(T),Y(T)) has to be taken into account. Since the climate system trajectories have to be bounded, we must invoke a negative feedback mechanism. The simplest model for that is given by

Assumption 3. The negative feedback is proportional to the climate variable, X(T), yielding F(X(T), Y(T)) = — 0 • X(T) + W(T).

Assumption 3 makes Eq. (2.29), which is a nonstationary random walk process (Section 2.6), to a stationary AR(1) process,

with 0 < a = 1 — 0 < 1. (Strictly speaking, this is an "asymptotically stationary" AR(1) process, see background material.) This explanation of Hasselmann's (1976) derivation of climate's AR(1) model is from von Storch and Zwiers (1999: Section 10.4.3 therein). Another account of the 1976 paper and its influence on climatology is given by Arnold (2001).

The suitability of the AR(1) noise model depends on how well Assumptions 1, 2 and 3 are fulfilled. Assumption 1 (timescale separability) is generally thought to be well fulfilled. The root cause is that the atmosphere has a much smaller density and heat capacity than most of the rest of the climate system, allowing weather processes to run faster. One caveat may be that some climatically relevant biological processes, such as algae growth (Lovelock and Kump 1994), may act also on short timescales. The validity of Assumption 2 is difficult to prove. It might well be that some weather influences do not add but rather multiply with each other, producing non-Gaussian distributional shape (Section 1.2; Sura et al. (2005)). But Assumption 2 can be relaxed to recognize this, leading to non-Gaussian AR(1) models. Assumption 3 is certainly not exactly fulfilled but may be a good first-order approximation. More sophisticated feedback mechanisms would lead to higherorder ARMA(p, q) or nonlinear models. An interesting case would be a nonlinear dynamical climate system with several local attractors and the occurrence probability within the attracting regions depending on the weather noise (Hasselmann 1999; Arnold 2001). The present knowledge about feedback processes is, however, too limited to permit the theoretical derivation of the precise model form. A further point is that external climate forcing mechanisms (e.g., volcanic eruptions) have to be included for achieving a full set of climate equations. The size of such forcings is currently not well understood (Section 8.4).

The dynamical equations in this section are for the evolution of high-dimensional climate variables, X(T), and not just for the noise part of one variable, Xnoise(T). In Eq. (1.1), we composed a climate variable of trend, outliers and noise. By allowing nonlinear trends and outlier processes, effects of violations of the assumptions made above are reduced. This lends credence to the AR(1) noise model.

2.5.2 Long memory of temperature fluctuations?

Peng et al. (1994) introduced Detrended Fluctuation Analysis (DFA) to measure persistence in DNA sequences. Peng et al. (1995) elaborated DFA in more detail and applied it to heartbeat time series. Koscielny-Bunde et al. (1996) introduced DFA to climate time series analysis and found a "universal power law governing atmospheric variability."

DFA uses a time series {t(i),x(i)}™=1 with constant spacing d > 0 to calculate the so-called profile, i y(i) = Y x(j), i — 1,...,n. (2.31)

The profile is divided into non-overlapping, contiguous segments of length l (multiple of d), discarding the mod(n, l/d) last points. The y(i) series is detrended by segment-wise fitting and subtracting polynomials of order 0, 1, 2, etc. Most commonly used are mean and linear detrending. Koscielny-Bunde et al. (1998a,b) studied the influence of different de-trending types, also other than polynomial detrending. The fluctuation function, F(l), is the standard deviation of detrended y(i) within a segment, averaged over all segments. F(l) is usually plotted on a double-logarithmic scale because power laws, F(l) rc la, appear in such plots as a straight line, with slope a.

For a polynomial of order 0 and x(i) from the Gaussian AR(1) model in Eq. (2.1), it readily follows that F(l) a: l1/2. For data with long-range dependence (Eq. 2.21), the power law F(l) a lH+1/2 results, that means, a = H + 1/2 (Talkner and Weber 2000). Thus, DFA can be seen as a method to estimate long-range dependence.

Koscielny-Bunde et al. (1996, 1998a,b) analysed daily temperature series covering typically the past 100 a using DFA and found a w 0.65 for many records. The claimed universal long-range power-law dependence would have serious theoretical and practical consequences. Govin-dan et al. (2002) went further and analysed temperature output from a number of AOGCMs, the currently most sophisticated mathematical tools for climate simulation. Since the AOGCMs did not produce a single, universal value but rather a scatter of a values, it was concluded that the models were not able to provide realistic climate forecasts. In particular, the predicted size of global warming would be overestimated. Later, Fraedrich and Blender (2003) used DFA to analyse monthly temperature series. They disputed the existence of the universal power law and suggested the following: a = 1 for oceanic data, a = 0.5 for inner continental data and a = 0.65 for data from the transition regions. This led to an exchange of arguments (Bunde et al. 2004; Fraedrich and Blender 2004), where in particular the results from the temperature record from Siberian station Krasnojarsk were assessed controversially. It appears that in reply to a criticism by Ritson (2004), the originators partly stepped back (Vyushin et al. 2004) from the claimed universality to a position with two memory laws, one for the ocean, the other for the continents.

Here we re-analyse the Krasnojarsk temperature record and also one series of North Atlantic air temperatures to assess whether the power-law exponents are similar or not. It is also asked whether the power laws are actually good models of the serial dependence in the temperature data and whether simple AR(1) models are not already sufficient. A simulation study helps to quantify the uncertainty of the result coming from sampling variations. We use the same gridded raw data set as Fraedrich and Blender (2003); also the technique for removing the annual cycle, the orders of the DFA polynomials (0, 1) and the scaling range of l = 1-15 a for a determination are identical to what these authors employed. Two points may limit comparability of results. First, we restrict ourselves to those time intervals where the monthly series have no gaps. Fraedrich and Blender (2003) took longer series that start in 1900, without explaining how they adapted their methodology to the case of missing data. Second, Fraedrich and Blender (2003) gave the coordinates 30°W, 50°N for the North Atlantic, but the raw data for the grid cells around this point start clearly later than 1900. We take a grid cell from somewhat more south that starts earlier.

The results of the DFA are as follows (Fig. 2.7). The F(l) curves show increases that resemble on first sight a power law. The a estimates in Table 2.1 were determined by fitting the power-law regression model to the F(l) points inside the selected scaling range 1-15 a. Note that fits of a linear regression model to logarithmically transformed l and F values would likely lead to a biased estimation, because Gaussian additive (measurement) noise of X(i) and F(i) would be lost by the transformation (Section 2.6). The resulting values exhibit a variation with the DFA detrending type that is considerably larger than the standard errors for a, indicating systematic estimation errors. The a estimations via DFA deviate also considerably from the values obtained via ARFIMA fits (Table 2.1).

 Station (grid point) Mean detrending Linear detrending ARFIMA(M, 0) Krasnojarsk 0.65 ± 0.02 0.79 ± 0.02 0.61 (50-55°N, 90-95°E) North Atlantic 0.58 ± 0.01 0.68 ± 0.01 0.73 (35-40° N, 25-30° W)

DFA errors are standard errors from unweighted least-squares regression (see Chapter 4). ARFIMA models were fitted using Whittle's approximate maximum likelihood technique (Beran 1994: Chapter 5 therein) and a determined via the relation a = & + 1/2.

DFA errors are standard errors from unweighted least-squares regression (see Chapter 4). ARFIMA models were fitted using Whittle's approximate maximum likelihood technique (Beran 1994: Chapter 5 therein) and a determined via the relation a = & + 1/2.

A close inspection of the F(l) curves (Fig. 2.7) reveals marked deviations from straight line in the double-logarithmic plots, especially for larger l. Such a behaviour might be referred to as "crossover" (Peng et al. 1995) and different scaling regions with different a values could be further investigated. Philosophy of science, however, says that this is a problematic step because it violates the principle of parsimony. A simpler explanation of the F(l) curves is that the class of power-law models is not ideally suited to describe the data.

The DFA study therefore explores also how the simplest persistence model, AR(1), is suited to describe the data. The AR(1) model (Eq. 2.1) was fitted using the estimator Eq. (2.4) and bias correction (Eq. 2.45), yielding a' = 0.23 (Siberia) and a' = 0.44 (North Atlantic). For Figure 2.7. Detrended Fluctuation Analysis for temperature records (Fig. 1.10) from Siberia (a) and North Atlantic (b). Shown as filled (open) symbols are fluctuation functions, F, against segment sizes, l [months] = 4,... , n/4, for the mean-detrended (linearly detrended) DFA variants. Also shown as shaded areas are the 90% confidence bands from simulation experiments based on AR(1) model fits (dark, mean-detrended; light, linearly detrended); the median (50%) simulation results are drawn as white lines. Simulation results are plotted for the range l = 1-15 a, for which power laws and the related question after the suitability of short- and long-memory models for the data are discussed in the main text.

Figure 2.7. Detrended Fluctuation Analysis for temperature records (Fig. 1.10) from Siberia (a) and North Atlantic (b). Shown as filled (open) symbols are fluctuation functions, F, against segment sizes, l [months] = 4,... , n/4, for the mean-detrended (linearly detrended) DFA variants. Also shown as shaded areas are the 90% confidence bands from simulation experiments based on AR(1) model fits (dark, mean-detrended; light, linearly detrended); the median (50%) simulation results are drawn as white lines. Simulation results are plotted for the range l = 1-15 a, for which power laws and the related question after the suitability of short- and long-memory models for the data are discussed in the main text.

both cases, n<si^ft — 10,000 AR(1) time series were generated (identical means, variances and autocorrelations as the data) and two DFA variants applied (mean detrending and linear detrending). The central 90% of simulated F(l) at each point l in the scaling region are shown as shaded area (Fig. 2.7). This is a percentile confidence band, that is, a set of percentile confidence intervals (Chapter 3) for F over a range of l values. The confidence bands contain large portions of the F(l) curves from the data. This indicates that DFA is not an ideal method to discriminate between power-law and AR(1) models. The median simulation result illustrates this point, where the AR(1) model produces an almost perfect straight line, which could be mis-interpreted as power-law behaviour (Fig. 2.7). However, systematic deviations exist between AR(1) and power-law models in the DFA plots for larger l. These could indicate some significant long-memory behaviour. This finding is supported by the ARFIMA fits, which have lower AICC values (Eq. 2.46) than simple AR(1) fits.

As regards the dispute about the universality of the power law in temperature series, on basis of the AR(1) and ARFIMA estimations (Table 2.1), we conclude that the oceanic data have a stronger memory than the land data. Because the difficulties associated with DFA in interpreting the double-logarithmic plots and selecting the suitable detrending method, ARFIMA models with their elaborated estimation techniques (Beran 1994) are to be preferred for quantifying long memory. This could also be the reason why DFA is almost completely absent from the statistical literature.

Although the evidence for long-memory dependence in temperature time series seems yet not strong, more records should be analysed with ARFIMA estimation for achieving a better overview. However, analysing aggregated spatial averages, such as northern hemisphere temperature (Rybski et al. 2006), is likely unsuited for this purpose because the aggregation of short-memory AR(1) processes with distributed autocorrelation parameters yields a long-memory process (Granger 1980): the long memory may be a spurious effect of the aggregation. This has been noted also by Mills (2007). Aggregation is likely not a problem for the data analysed here, which come from one station (Siberia) and less than or equal to four stations (North Atlantic), respectively. Hemispheric averages may, however, result from processing several thousand individual records (Chapter 8).

2.5.3 Long memory of river runoff

Hurst found in an influential paper (Hurst 1951) evidence for long memory in runoff records from the Nile. The long-memory property of runoff time series has subsequently been confirmed for a number of rivers; see, for example, Mandelbrot and Wallis (1969), Hosking (1984), Mesa and Poveda (1993), Montanari et al. (1997), Montanari (2003), Pelletier and Turcotte (1997), Koutsoyiannis (2002), Bunde et al. (2005) and Koscielny-Bunde et al. (2006). Hurst's finding inspired the development of the theory of long-memory processes (Section 2.4.1) and of their estimation (Hosking 1984). Up to now, no widely accepted physical explanation of the "Hurst phenomenon" of long memory, that means on the basis of the physical-hydrological system properties, has been found (Koutsoyiannis 2005a,b).

The paper by Mudelsee (2007) presents an explanation, which suggests that a river network aggregates short-memory precipitation and converts it into long-memory runoff. River basins (Fig. 2.8) form a network of tributaries, confluences and reservoirs that has been geometrically described as a fractal object (Rodriguez-Iturbe and Rinaldo 1997). Consider a single, hydrologically homogeneous area Aj, that is, a reservoir with a linear input-output rule described by a dimensionless positive constant kj. If the input to the reservoir, given by precipitation minus evaporation, is a purely random process, then it has been shown (Klemes 1978) that the output, Xj(i), is an AR(1) process (Section 2.1) with autocorrelation parameter aj = 1/(kj + 1). This further implies that the runoff at a point in a river is not from a single reservoir but a cascade (Klemes 1974) of reservoirs, one feeding the next (Fig. 2.8):

j=i where Xj (i) are mutually independent AR(1) processes with autocorrelation parameter aj. It has been shown previously that if the Xj (i) have identical means (zero), identical standard deviations (unity) and the aj are either beta-distributed (Granger 1980) or uniformly distributed (Linden 1999), then for m ^ to the aggregated process Xj (i) is a longmemory process. Monte Carlo simulations (Mudelsee 2007) reveal that the estimated long-memory parameter 5 increases with m and that saturation of 5 sets in already for m & 100. This leads to the suggestion (Mudelsee 2007) that long memory in river runoff results from spatial aggregation of many short-memory reservoir contributions.

To test the aggregation hypothesis, Mudelsee (2007) studied the longmemory parameter 5 of fitted ARFIMA(1, 5,0) models in dependence on the basin size, A. The idea of the 5(A) estimation is that with increasing A also the number m of contributions Xj (i) grows. Thereby should also 5 increase, from zero (m = 1) to a saturation level below Figure 2.8. River network. Runoff at a point (e.g., 4) is the spatial aggregation of runoff from upstream (to the left in the picture). Shown is also a hypothetical spatial unit j with area Aj. The basin size, A, for a point is given by the sum of the areas Aj upstream. (From Mudelsee (2007), with permission from the publisher.)

Figure 2.8. River network. Runoff at a point (e.g., 4) is the spatial aggregation of runoff from upstream (to the left in the picture). Shown is also a hypothetical spatial unit j with area Aj. The basin size, A, for a point is given by the sum of the areas Aj upstream. (From Mudelsee (2007), with permission from the publisher.)

0.5 (m large). The fact that the distribution of the aj for real rivers is difficult to derive empirically or analytically, can be ignored at this low level of sophistication of the hypothesis. The resulting 5(A) curve for one of the longest available runoff records, from the river Weser (Germany) (Fig. 2.9), basically confirms the aggregation explanation of the Hurst phenomenon.

Mudelsee (2007) estimated 5(A) curves also for other rivers, finding similar 5(A) increases (Elbe, Rhine, Colorado and Nile) but also in one case (Mississippi) a 5(A) decrease. This paper discusses the validity of the various assumptions made by the aggregation hypothesis. One particular criticism is that the linear input-output release rule may be violated for very large reservoirs. Another obstacle is the requirement of very long time series (above, say, 70 years) for obtaining sufficient accuracy. A major criticism is that the aggregation of AR(1) processes is not an ARFIMA process (Linden 1999), and that the result (Fig. 2.9) may therefore be affected by estimation bias. This paper (Mudelsee 2007) further finds little evidence for long memory in precipitation records from the same regions as the river basins. It thus appears appropriate to reserve the concept of the "Hurst phenomenon" for hydrological time series, and not for climate time series in general.

10 20 30

Figure 2.9. Long-memory parameter in dependence on basin size, river Weser. The S estimates (dots) are shown with bootstrap standard errors (Doornik and Ooms 2003). The time series are monthly runoff values from January 1857 to April 2002. S was estimated using an ARFIMA(1, S, 0) model and maximum likelihood (Doornik and Ooms 2003). Prior to the estimations, the data were logarithmically transformed, the annual cycle removed by subtracting the day-wise long-term averages and the linear trends removed. This ARFIMA model had for all four river stations better AIC values than the ARFIMA(0, S, 0) model. (After Mudelsee 2007.)

2.6 Background material

Textbooks on the theory and estimation of ARMA(p, q) processes were written by Priestley (1981), Brockwell and Davis (1991, 1996), Box et al. (1994) and Chatfield (2004). Nonlinear time series models are covered by Priestley (1988), Tong (1990) and Fan and Yao (2003). The latter two books have the notable aim to bridge the gap between statistics and nonlinear dynamics, see also Tong (1992, 1995). Longmemory processes are the topic of Beran (1994, 1997), Doukhan et al. (2003) and Robinson (2003). ARFIMA(p, q) processes are reviewed by Beran (1998) and fractional Gaussian noise processes by Mandelbrot (1983). Tables of series and other formulas can be found in the books by Abramowitz and Stegun (1965) and Gradshteyn and Ryzhik (2000).

The AR(1) model standard formulation is (Priestley 1981: Section 3.5.2 therein)

where E(i) is a stationary purely random process with mean and standard deviation In the general case this noise model is not stationary. For = 0 and |a| < 1, E [Xnoise(i)] is not constant but approaches with time a saturation value of ^e/(1 — a). For = 0 and |a| < 1, VAR [Xnoise(i)] is not constant but approaches with time a saturation value of ^2/(1 — a2). The case = 0 and |a| = 1 results in a random

walk process (p. 60). The standard formulation in Eq. (2.33) describes an "asymptotically stationary" process, whereas Eq. (2.1) describes a strictly stationary process. In practical applications such as random number generation (Section 2.7), the standard model (Eq. 2.33) can be used if the transient sequence of numbers (say, the first 5000) is discarded.

The covariance between two random variables, X and Y, is

The effective data size is reduced for a persistent process. This is shown (Fig. 2.10a) for the case of mean estimation of an AR(1) process, where even for moderate values (n ^ 50 and a ^ 0.5), n^ is considerably smaller than n. The data size reduction is quantified by Eq. (2.7). A simplified version based on the decorrelation time (Eq. 2.8) underestimates n^ by less than 5% for n ^ 50 and a ^ 0.5, as shown by Fig. 2.10b. Even for moderate values of n (^ 50) and a (^ 0.5), the influence of persistence on n^ can be considerable. Eq. (2.3) is valid only for the mean estimator. The effective data size for variance estimation of an AR(1) process, n'2

(Bayley and Hammersley 1946), is given by

a2n 1

1 a2

Likewise, the effective data size for correlation estimation between two processes X(i) and Y(i) with autocorrelation functions pX (i) and pY (i), np = n n1

(von Storch and Zwiers 1999), is in the case of two AR(1) processes with persistence parameters aX and aY given by np = n < 1 + - Figure 2.10. Effective data size, mean estimation of an AR(1) process. a Dependence of n^ on n after Eq. (2.7), for various persistence values a (0.2, 0.5, 0.9 and 0.99). b Comparison of the exact expression (Eq. 2.7) with a simplified version based on the decorrelation time (Eq. 2.8).

Data size, n

Figure 2.10. Effective data size, mean estimation of an AR(1) process. a Dependence of n^ on n after Eq. (2.7), for various persistence values a (0.2, 0.5, 0.9 and 0.99). b Comparison of the exact expression (Eq. 2.7) with a simplified version based on the decorrelation time (Eq. 2.8).

Early papers in climatology on effective data size and the influence of persistence on estimation variance include Matalas and Langbein (1962), Leith (1973), Laurmann and Gates (1977), Thiebaux and Zwiers (1984), Trenberth (1984a,b) and Zwiers and von Storch (1995).

Various approximate bias (and variance) formulas have been published for estimators of the autocorrelation parameter a in evenly spaced AR(1) models (Eqs. 2.1 and 2.33). Marriott and Pope (1954) analysed a (Eq. 2.4) and gave

White (1961) gave an approximation of higher order in terms of powers of (1/n):

E (a) ~ (1 - 2/n + 4/n2 - 2/n3) a + (2/n2) a3 + (2/n2) a5. (2.40)

One drawback of these approximations is that they are not accurate for large a. For a ^ 1, also a ^ 1 (Eq. 2.4) and the bias, E (a) — a, approaches zero. This behaviour is not contained in Eqs. (2.39) or (2.40). It is, however, contained in the bias formula of Mudelsee (2001a):

E (a) ~ [1 — 2/(n — 1)] a + [2/(n — 1)2] (a — a2n-1) /(1 — a2) . (2.41)

Mudelsee (2001a) showed that this approximation is more accurate than Eq. (2.40) for a ^ 0.88. The estimation variance of a is to a low approximation order (Bartlett 1946)

and to a higher order (White 1961)

VAR (a) ~ (1/n — 1/n2 + 5/n3) — (1/n — 9/n2 + 53/n3) a2 — (12/n3) a4.

Higher-order approximations of the first four moments of a are given by Shenton and Johnson (1965). From a practical point of view, it is more realistic to assume that the mean of Xnoise(i) is unknown and has to be subtracted. In case of the AR(1) process with unknown mean, the analogue of the estimator in Eq. (2.4) is where xnoise = ^™=i xnoise(i)/n is the sample mean. The approximate expectation of this estimator is (Kendall 1954)

Monte Carlo simulations (Fig. 2.11) indicate that this approximation can be used for bias correction in situations with uneven spacing and moderate autocorrelation (a less than, say, 0.9). The bias-corrected autocorrelation coefficient, a', is obtained from a by inserting a' for a on the right-hand side and a for E(a) on the left-hand side in one of the equations describing the bias, say Eq. (2.45), and solving this equation for a'. Approximations for the bias of least-squares and Yule-Walker estimators of AR(p) processes with known/unknown mean have been given for p < 6 by Shaman and Stine (1988). Sample mean subtraction is a special case of detrending. It follows that if we obtain {xnoise(i)}™=1 from the data {x(i)}rn=1 by removing an estimated trend function, {xtrend(i)}n=1 (Eq. 1.2), in principle we have to replace xnoise in

For trends more complex than a constant function, bias properties of such estimators seem, however, to be analytically untractable.

0 0