## Introduction

Superiority of quantitative methods over qualitative

"Weather is important but hard to predict"—lay people and scientists alike will agree. The complexity of that system limits the knowledge about it and therefore its predictability even over a few days. It is complex because many variables within the Earth's atmosphere, such as temperature, barometric pressure, wind velocity, humidity, clouds and precipitation, are interacting, and they do so nonlinearly. Extending the view to longer timescales, that is, the climate system in its original sense (the World Meteorological Organization defines a timescale boundary between weather and climate of 30 years), and also to larger spatial and further processual scales considered to influence climate (Earth's surface, cryosphere, Sun, etc.), does not reduce complexity. This book loosely adopts the term "climate" to refer to this extended view, which shall also include "paleoclimate" as the climate within the geologic past.

Man observes nature and climate to learn, or extract information, and to predict. Since the climate system is complex and not all variables can be observed at arbitrary spatial and temporal range and resolution, our knowledge is, and shall be, restricted and uncertainty is introduced. In such a situation, we need the statistical language to acquire quantitative information. For that, we take the axiomatic approach by assuming that to an uncertain event ("it rains tomorrow" or "before 20,000 years the tropics were more than 5°C colder than at present") a probability (real number between 0 and 1) can be assigned (Kolmogoroff 1933). Statistics then deciphers/infers events and probabilities from data. This is an

M. Mudelsee, Climate Time Series Analysis, Atmospheric and 3

Oceanographic Sciences Library 42, DOI 10.1007/978-90-481-9482-7_1, © Springer Science+Business Media B.V. 2010

assumption like others in the business: three-dimensional space, time arrow and causality, mathematical axioms (Kant 1781; Polanyi 1958; Kandel 2006). The book also follows the optimistic path of Popper (1935): small and accurately known ranges of uncertainty about the climate system enable more precise climate hypotheses to be tested, leading to enhanced knowledge and scientific progress. Also if one shares Kuhn's (1970) view, paradigm shifts in climatology have better success chances if they are substantiated by more accurate knowledge. It is the aim of this book to provide methods for obtaining accurate information from complex time series data.

Climate evolves in time, and a stochastic process (a time-dependent random variable representing a climate variable with not exactly known value) and time series (the observed or sampled process) are central to statistical climate analysis. We shall use a wide definition of trend and decompose a stochastic process, X, as follows:

X(T) = Xtrend(T) + Xout(T) + S(T) ■ Xnoise(T), (1.1)

where T is continuous time, Xtrend(T) is the trend process, Xout(T) is the outlier process, S(T) is a variability function scaling Xnoise(T), the noise process. The trend is seen to include all systematic or deterministic, long-term processes such as a linear increase, a step change or a seasonal signal. The trend is described by parameters, for example, the rate of an increase. Outliers are events with an extremely large absolute value and are usually rare. The noise process is assumed to be weakly stationary with zero mean and autocorrelation. Giving Xnoise(T) standard deviation unity enables introduction of S(T) to honour climate's definition as not only the mean but also the variability of the state of the atmosphere and other compartments (Brückner 1890; Hann 1901; Koppen 1923). A version of Eq. (1.1) is written for discrete time, T(i), as

X(i) = Xtrend(i) + Xout(i) + S(i) • Xnoise(i), (1.2)

using the abbreviation X(i) = X(T(i)), etc. However, for unevenly spaced T(i) this is a problematic step because of a possibly non-unique relation between Xnoise(T) and Xnoise(i), see Section 2.1.2.1. The observed, discrete time series from process X(i) is the set of size n of paired values t(i) and x(i), compactly written as {t(i), x(i)}™=1. To restate, the aim of this book is to provide methods for obtaining quantitative estimates of parameters of Xtrend(T), Xout(T), S (T) and Xnoise(T) using the observed time series data {t(i),x(i)}™=1.

A problem in climate analysis is that the observation process superimposes on the climatic process. Xnoise(T) may show not only climatic but also measurement noise. Outliers can be produced by power loss in the recording instrument. Non-climatic trends result, for example, from changing the recording situation. An example is temperature measurements made in a town that are influenced by urbanization (meaning an increased heat-storage capacity). However, measurement noise can in principle be reduced by using better instruments, and outliers and trends owing to the observation system can be removed from the data— climatologists denote such observation trend free data as homogeneous.

A further problem in real-world climatology is that also the time values have to be estimated, by dating (Section 1.1). Dating errors are expected to add to the noise and make the result more uncertain.

Consider a second climate variable, Y(T), composed as X(T) in Eq. (1.1) of trend, outliers, variability and noise. The interesting new point is the dependence between X(T) and Y(T). Take as example the relation between concentration of CO2 in the atmosphere and the global surface temperature. In analogy to univariate X, we write {X(T), Y(T)}, {T(i),X(i),Y(i)} and {t(i),x(i),y(i)}n= 1 for such bivariate processes and time series. This book describes methods only for uni- and bivari-ate time series. Possible extensions to higher dimensions are mentioned in Chapter 9.

{t(i),x(i),y(i)}n=1 need not result from the natural climate system but may also be the output from a mathematical climate model. Such models attempt to rebuild the climate system by connecting climate variables with governing mathematical-physical equations. Owing to the limited performance of computers and the uncertain knowledge about climatic processes, climate models are necessarily limited in spatial, pro-cessual and temporal resolution (McAvaney et al. 2001; Randall et al. 2007). On the other hand, climate models offer the possibility to perform and repeat climate experiments (say, studying the influence of doubled concentrations of CO2 in the atmosphere on precipitation in dependence on different model implementations of the role of clouds).

1.1 Climate archives, variables and dating

Climate archives "contain" the time series. The measured variables (x(i),y(i)) either are of direct interest, as in case of precipitation and temperature, or they bear indirect information (indicator or proxy variables). The estimated times (t(i)), in geosciences often called timescale, are obtained either by direct, absolute dating methods or indirectly by comparison with another, dated time series. Table 1.1 gives an overview about climate archives and absolute dating methods. Table 1.2 informs about climate variables and their proxies studied in this book. More details are provided in Figs. 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9 and

Table 1.1. Main types of climate archives, covered time ranges and absolute dating methods.

Table 1.1. Main types of climate archives, covered time ranges and absolute dating methods.

Dark shading means "frequently used," light shading means "occasionally used." Pl., Pliocene; b.p., "before the present." Background material (Section 1.6) gives details and references on geological epochs (also before Pliocene), archives and dating.

1.10, where some of the time series analysed in this book are presented, and in the background material (Section 1.6).

### 1.2 Noise and statistical distribution

The noise, Xnoise(T), has been written in Eq. (1.1) as a zero-mean and unit-standard deviation process, leaving freedom as regards its other second and higher-order statistical moments, which define its distributional shape and also its spectral and persistence properties (next section). The probability density function (PDF), f (x), defines a+5

a putting our incomplete knowledge in quantitative form.

For analysing, by means of explorative tools, the shape of f(x) using time series data {t(i), x(i)}™=1, it is important to estimate and remove the trend from the data. An unremoved trend would deliver a false, broadened picture of f(x). Trend removal has been done for constructing Fig. 1.11, which shows histograms as estimates of the distributions of Xnoise(T) for various climate time series. The estimation of trends is

 Climate archive Location Time range (a) Proxy variable Resolution (a) Climate variable Marine sediment core Eastern 106 S18O, benthic 103 Ice volume, equatorial foraminifera bottom water Pacific temperature Ice core Antarctica 105 CO2, air bubbles 103 CO2, atmosphere SD, ice 102 Air temperature Greenland 105 SO4 content, ice 100 Volcanic activity Ca content, ice 100 Aeolian dust, wind Dust content, ice 100 Aeolian dust, wind Conductivity,1 100 Soluble material, ice wind Na content, ice 100 Seasalt, wind Tree-rings Worldwide 104 A14C, wood 100 Solar irradiance, ocean circulation Lake sediment core Boston area 103 Varve thickness 100 Windb Speleothem Southern Oman 104 S18O, carbonate 101 Monsoon rainfall Documents Weikinn 103 100 Floods, river Elbe source texts Climate model Hadley Centre, 102 100 River runoff HadCM3 Direct measurements Siberia, 102 10-1 Surface North Atlantic temperature

Time range refers to the length of a record, resolution to the order of the average time spacing (see Section 1.4). "Proxy variable" denotes what was actually measured on which material. "Climate variable" refers to the climatic variations recorded by the variations in the proxy variable. The ability of a proxy variable to indicate a climate variable depends on the characteristic timescales (between resolution and time range). For example, ¿18O variations in benthic foraminifera over timescales of only a few decades do not record icevolume variations (which are slower). The Weikinn source texts are given by Weikinn (1958, 1960, 1961, 1963, 2000, 2002). a Electrical conductivity of the melted water.

b Extremely thick varves (graded beds) indicate extremely high wind speed (hurricane).

Time range refers to the length of a record, resolution to the order of the average time spacing (see Section 1.4). "Proxy variable" denotes what was actually measured on which material. "Climate variable" refers to the climatic variations recorded by the variations in the proxy variable. The ability of a proxy variable to indicate a climate variable depends on the characteristic timescales (between resolution and time range). For example, ¿18O variations in benthic foraminifera over timescales of only a few decades do not record icevolume variations (which are slower). The Weikinn source texts are given by Weikinn (1958, 1960, 1961, 1963, 2000, 2002). a Electrical conductivity of the melted water.

b Extremely thick varves (graded beds) indicate extremely high wind speed (hurricane).

one of the primary tasks in climate time series analysis and described in Chapter 4. In Fig. 1.11, outliers, sitting at the tail of the distribution, are tentatively marked. The variability, S(T), has only been normalized in those panels in Fig. 1.11 where it is not time-constant.

As the histogram estimates of the PDFs reveal, some distributions (Fig. 1.11b, i, j) exhibit a fairly symmetrical shape, resembling a Gaussian

Figure 1.1. Documentary data: floods of the river Elbe during winter over the past 1000 years. x, the flood magnitude, is in three classes (1, minor; 2, strong; 3, exceptionally strong). Hydrological winter is from November to April. Data for t < 1850 were extracted from Curt Weikinn's compilation (Weikinn 1958, 1960, 1961, 1963, 2000, 2002) of source texts on hydrography in Europe; accuracy of flood dates is ~ 1 month. Data for t > 1850 were inferred from daily measurements of water stage and runoff (volume per time interval) at Elbe station Dresden (Global Runoff Data Centre, Koblenz, Germany) via a calibration of magnitude versus water stage/runoff (Mudelsee et al. 2003). Because floods can last up to several weeks, only the peaks in stage/runoff were used to ensure independence of the data. Total number of points is 211. Data sparseness for t ^ 1500 is likely caused by document loss (inhomogeneity). One climatological question associated with the data is whether floods occur at a constant rate or there is instead a trend. (Data from Mudelsee et al. 2003.)

1000 1200 1400 1600 1800 2000

Year

Figure 1.1. Documentary data: floods of the river Elbe during winter over the past 1000 years. x, the flood magnitude, is in three classes (1, minor; 2, strong; 3, exceptionally strong). Hydrological winter is from November to April. Data for t < 1850 were extracted from Curt Weikinn's compilation (Weikinn 1958, 1960, 1961, 1963, 2000, 2002) of source texts on hydrography in Europe; accuracy of flood dates is ~ 1 month. Data for t > 1850 were inferred from daily measurements of water stage and runoff (volume per time interval) at Elbe station Dresden (Global Runoff Data Centre, Koblenz, Germany) via a calibration of magnitude versus water stage/runoff (Mudelsee et al. 2003). Because floods can last up to several weeks, only the peaks in stage/runoff were used to ensure independence of the data. Total number of points is 211. Data sparseness for t ^ 1500 is likely caused by document loss (inhomogeneity). One climatological question associated with the data is whether floods occur at a constant rate or there is instead a trend. (Data from Mudelsee et al. 2003.)

(Fig. 3.1). Other PDFs (Fig. 1.11c-h, k), however, have more or less strongly right-skewed shape. Possibly Fig. 1.11d (Vostok ¿D) reflects a bimodal distribution.

Table 1.3 informs about the size of the variability, S(T), in relation to the uncertainty associated with the pure measurement for the time series analysed here. S(T) reflects the variability of the climate around its trend (Eq. 1.1), the limited proxy quality when no directly measured variables are available and, finally, measurement error. As is evident from the data shown, the measurement error is often comparably small in climatology. It is in many studies that use proxy variables one of the major tasks to quantify the proxy error. For example, if ¿18O in shells of benthic foraminifera from deep-sea sediment cores is used as proxy for global ice volume, bottom-water temperature fluctuations make up nearly 1/3 of S(T), see Table 1.3.

A relation proxy variable-climate variable established under laboratory conditions is not perfect but shows errors, quantifiable through regression (Chapter 8). Assuming that such a relation holds true also in

Figure 1.2. Marine sediment core data: ¿18O record from Ocean Drilling Program (ODP) site 846 (eastern equatorial Pacific) within 2-4 Ma. The core was drilled from a ship through ~ 3300 m water into the ocean floor, it has a length of ~ 460 m and a diameter of ~ 35 cm. The oxygen isotope record (Shackleton et al. 1995b) was measured on the calcareous shells of benthic foraminifera, mainly C. wuellerstorfi and Uvigerina spp., using a mass spectrometer. Values are given in delta notation: ¿18O = [(18O/16O)sampie/(18O/16O)pDB - 1] • 1000%, where (18O/16O) is the number ratio of oxygen isotopes 18O and 16O and PDB is "Pee Dee Belemnite" standard. A value of 0.64%o was added to all ¿18O values from C. wuellerstorfi to correct for a species-dependent offset (Shackleton and Hall 1984). The depth scale was transformed into a timescale in several steps (Shackleton et al. 1995a). First, biostratigraphic positions, that is, core depths documenting first or last appearances of marine organisms, provided a rough time frame. (Unlike many other marine sediment cores, site ODP 846 lacks a magnetostratigraphy, that is, recorded events of reversals of the Earth's magnetic field, which might had improved the temporal accuracy at this step.) Second, a proxy record of sediment density was measured using a gamma-ray attenuation porosity evaluation (GRAPE) tool. Third, the ODP 846 GRAPE record was tuned (Section 1.6) to the combined GRAPE record from ODP sites 849, 850 and 851. This stacked GRAPE record had in turn been previously tuned to the time series of solar insolation at 65°N (Berger and Loutre 1991), calculated using standard procedures from astronomy. The reason behind this cross-tuning procedure is the observation (Hays et al. 1976) that Earth's climatic variations in the order of tens of thousands to several hundreds of thousands of years are influenced by solar insolation variations. Since the sedimentation rate in the geographic region of site ODP 846 varies with climate (Shackleton et al. 1995a), one cannot assume a constant accumulation of the marine archive. Hence, the dates of sediment samples between the biostratigraphic fixpoints cannot be obtained by interpolation and the GRAPE density records had to be used to obtain an absolute timescale by tuning. Note that time runs "in paleoclimatic manner" from the right to the left. In the same fashion, the ¿18O scale is inverted such that glacial conditions (large ice volume, low bottom water temperature or large ¿18O values) are indicated by the bottom part and interglacial conditions by the top part of the plot. The number of data points, n, within the shown interval is 821, the average spacing is ~ 2.4 a. A comparison between absolutely dated and tuned magnetostratigraphic timescales for the Pliocene to early Pleistocene (Mudelsee 2005) suggests an average age deviation of ~ 25 ka; this value can also serve to indicate the magnitude of the absolute error of the ODP 846 timescale. The record indicates variations in global ice volume and regional bottom water temperature. One task is to quantify the long-term ¿18O increase, which reflects the glaciation of the northern hemisphere in the Pliocene.

 -400-1 ) - ( -450- £J to - -500- 320-i ^ E 280- CP cp 240-

100 200 300 Age (ka)

Figure 1.3. Ice core data: deuterium and CO2 records from the Vostok station (Antarctica) over the past 420,000 years. The core was drilled into the ice (diameter: 12 cm, length: 3623 m) and recovered in segments. The deuterium record (a) was measured on ice material using a mass spectrometer. Values are given in delta notation: ¿D = [(D/H)Sampie/(D/H)sMOW - 1] • 1000%, where (D/H) is the number of D particles over the number of H particles and SMOW is "Standard Mean Ocean Water" standard. Total number of points, n, is 3311. The CO2 record (b) was measured on air bubbles enclosed in the ice. Values are given as "parts per million by volume," n is 283. In b, values (dots) are connected by lines; in a, only lines are shown. The present-day CO2 concentration (~ 389 ppmv) is not recorded in b. The construction of the timescale (named GT4) was achieved using a model of the ice accumulation and flow. Besides glaciological constraints, it further assumed that the points at 110 and 390 ka correspond to dated stages in the marine isotope record. Construction of the CO2 timescale required additional modelling because in the ice core, air bubbles are younger in age than ice at the same depth. One climatological question associated with the data is whether variations in CO2 (the values in air bubbles presenting the atmospheric value accurately) lead over or lag behind those of deuterium (which indicate temperature variations, low ¿D meaning low temperature). (Data from Petit et al. 1999.)

the geologic past increases the proxy error. Spatially extending the range for which a variable is thought to be representative is a further source of error. This is the case, for example, when variations in air temperature in the inversion height above Antarctic station Vostok are used to repre-

1.3 Persistence 6000-1

0 0