Methodology Of Cfr From Proxy Data

Our approach to CFR from proxy data will be quite similar to that applied previously to historical SST and SLP data (Kaplan et al., 1997, 1998, 2000). At all steps of the procedure we seek globality, optimality, and space reduction. The process involves six steps, which are schematically outlined in Fig. 1 and are described in detail later.

4.2.1. Characterization of the Target Climate Field

We use the modern observed data to provide estimates of the climatological mean and spatial covari-ance C of the target climate anomaly field. For the purpose of the study of large-scale phenomena, statistical description of the field can be further improved by first performing an optimal analysis of the historical climate data to provide reconstructions of the target climate field itself. We then determine the leading spatial modes of signal variability E (expressed as empirical orthogonal functions, or EOFs) and their energy distribution A (eigenvalues) through canonical decomposition of the covariance matrix C and truncation of statistically insignificant modes (Cane et al., 1996; Kaplan et al., 1997, 1998):

where boldface variables indicate vectors or matrices, <...> indicates time averaging, and superscript T denotes the matrix or vector transpose. Here, E is a matrix whose columns are a relatively small number of the eigenvectors of C, and A is a square matrix whose diagonal elements are the eigenvalues corresponding to E. We then assume a reduced space form for the solution

where a(t) is the low-dimensional vector of amplitudes with which the modes E contribute to the climate field T at time t, and T is the long-term mean zero. For the modern period, the EOFs E and the vectors a(t) are known from the analysis of the instrumental data. For the purposes of CFR, a portion of this analyzed data set T may be used for calibration of the proxy data (see Section 4.2.3). The part of the target climate field not used for calibration purposes can then be used for independent verification of the corresponding period of the proxy-based CFR (see Section 4.2.5).

We presume the analysis of the target modern climate data captures the potential modes of variability reflected in proxy observations. Specifically, we assume that the leading patterns of pre-instrumental SST vari ability may be adequately represented by some linear combination of the EOFs used in the analysis of the SST data. Although this assumption may be true for the largest scale patterns that are likely to be resolved by proxy data over the past several hundred years, it may be that the climate system has operated in modes that are not spanned by this characterization of the instrumental data. We also assume that the errors in this analysis are small relative to the errors in the proxy-based CFR we seek and that the climatic patterns resolved by the proxy data are a small subset of those resolved in modern historical observations.

4.2.2. Characterization of the Proxy Data

In cases where numerous proxy data realizations are expected to resolve a much smaller dimension of climatic information, we may perform a similar statistical decomposition and analysis of the proxy data, D(t), to determine the leading modes of variability evident in the proxy data. In doing so, we assume that the leading modes of variability observed in the proxy data set itself are climatic in origin, rather than the expression of small-scale climatic effects or the imprint of nonclimat-ic influences. If this is the case, the leading eigenvalues of the decomposition may be well separated from the rest, and the corresponding EOFs will explain a large fraction of the variance in the proxy data set. This hypothesis may be checked by examining the corresponding spatial structure in the target climate field or by examining the frequency domain characteristics of the proxy amplitudes (Wallace, 1996a,b). We may also use these results to analyze the proxy data itself, following Eqs. (3)-(12) in Section 4.2.4, to emphasize common cross-site features observed in the proxy data. Such prefiltering of the proxy data reduces the potential for unrelated patterns in the proxy data to falsely project upon the leading modes of climate field variability. Statistical analyses may be checked for consistency with physical or empirical knowledge derived from development and intercomparison of the proxy data with local climatic data.

4.2.3. Calibration

We seek calibration of the proxy data D(t) in terms of the time amplitudes of the reduced space representation of the climate field, a(t). More precisely, the low-dimensional representation of T in terms of E and a(t) permits the estimation scheme

Equation (3) sets up the Gauss-Markov observational scheme for estimation of a(t) from the observed proxy vector D (Gandin, 1965; Mardia et al., 1979; Rao, 1973). The measurement operator H is a linear function that maps from the climate field to the proxy observations. In other words, it represents a linear regression of the proxy data on the temporal amplitudes of the leading patterns of climate variability. For example, if D were SST estimates at points, then H would be a spatial interpolation of E to the observational locations. In this work H describes the linear function relating dimen-sionless tree-ring-width indices from terrestrial locales to leading modes of global SST variability. Locally, it might express the linearized relationship expected from local calibration studies or from principles of den-droclimatology.

We introduce R = <eeT> as the matrix of observational error covariances. As defined, it is the error in observation of the climate field from the proxy data. Hence, it includes the error in observations of each proxy, the error due to space reduction [Eq. (2)], and the error in estimation of the measurement operator H. This last error component is estimated via calibration over a limited time interval with good data coverage. Note that the construction of H in this manner permits nonlocal information to be recovered by the proxy data.

We use the proxy data and the instrumental data to estimate H and e over a calibration interval, using the singular value decomposition (SVD). The SVD procedure decomposes the covariance between the proxy data and principal components (PCs) of the climate field into linear combinations of singular values and left and right singular vectors:

where the columns of U and V are orthogonal vectors describing modes of covariance between the proxy data and the climate field, respectively, and <...> now indicates time averaging over the calibration period.

As a sparse observational network of proxy data may be expected to robustly resolve only a few leading modes of climate variability, we retain for multivariate regression with vector of coefficients h only the leading modes of covariance between proxy data and climate field, as defined by the SVD procedure

Here, Vr is the matrix consisting of the first few columns of V. How many columns of V to retain is a somewhat subjective choice, though one that may be subsequently tested statistically. The use of such screening techniques to form more robust estimates of the relationships between two fields has long been applied in meteorology (Bretherton et al., 1992) and den-droclimatology (Fritts et al., 1971; Cook et al., 1994).

The vector of regression coefficients h is then esti mated based on these presumably resolvable and climatic modes:

where ar = VrTa. The error in the map e is the difference between the proxy observations and their best-fit estimates:

The map from the reduced space representation of the climate field to the proxy data [cf. Eqs. (3) and (5)] is then

Equations (7) and (8) specify all parameters necessary for the estimation scheme given in Eq. (3).

In some cases the instrumental data and an understanding of the target climate field and its dynamics can be used to construct a model for climatic time transitions:

which can be used in CFR as an additional source of information for OA in an optimal smoother analysis (Kaplan et al., 1997). If no such model is available, and the signal is assumed to have zero mean, then the trivial time transition model A = 0 may be employed:

a(t+1) = 0a(t) + et, <etetT> = <aaT> = L.

This procedure assumes that the proxy measurements may be represented as a linear function of the time variability of large-scale modes of climatic variability and a residual. This functional dependence may include other factors insofar as they are linearly related to the variability exhibited by the climate field T within the calibration period. In fact, we have found in studies of coral 818O data that nonlocal climatic variability may constitute a significant part of the map to SST (Evans et al., 2000). We further assume that any bias in the linear relationship between climate field and proxy data is removed prior to analysis. Proxy age model uncertainty is included here only through influence on the estimated observational SST error € in the map H. We also assume that the map from climate field to proxy data is a robust estimate, which neither overfits nor un-derrepresents the climatic information in the proxy data, and that this map and its error do not change over the period of the reconstruction. These assumptions may be subsequently subjected to testing and verification (Section 4.2.5).

4.2.4. Analysis

In many observed and gridded meteorological and oceanographic fields, the number of patterns that may be readily discerned above observational noise, data gaps, and other uncertainties is small. For CFR based on proxy data, the number of observations is much smaller and the observational error is far larger, so the dimension of the resolvable climatic space is expected to be even smaller.

We seek the field T(x,t), which is the best fit to the proxy observations constrained to be near the spatial target field covariance. We construct a cost function

S evaluated at each time t is a unitless scalar quantity. Here, a is the vector of temporal amplitudes we seek to reconstruct from the proxy data. The first term of Eq. (9) represents the squared residuals between the proxy observations and their a-based predictions at each time t. R is the estimate of their covariance [Eq. (7)] and weights this set of residuals. Similarly, the second term weights the residuals of the trivial prediction model a = 0 with its inverted error covariance, which is the co-variance of the signal [Eq. (1)]. At each time t, the analysis is punished (S r large) for putting too much stock either in observations with large error or in patterns that explain little of the energy in the spatial covariance patterns observed in the modern SST field (Kaplan et al., 1997).

Minimization of a quadratic cost function such as Eq. (9) retrieves the generalized least-squares estimate of a (Sokal and Rohlf, 1995). If the proxy observations are unbiased, and if the a priori error covariances R and A are well estimated and uncorrelated in time, then minimization of S produces the best linear unbiased estimate (BLUE) (Mardia et al., 1979). For this specific minimization problem, in which S has been written in terms of reduced space variables, minimization at each time t produces the reduced space optimal estimate (Kaplan et al., 1997):

and the field T is recovered by Eq. (2)

P is the error covariance in the estimate T:

and it represents a weighted average of the error in the observations mapped to the analysis domain and the spatial field covariance estimate. The weights are determined by the observational error covariance matrix R, and the variance A is resolved by the EOFs E over the calibration period. The rank of H is determined as part of the calibration procedure (Section 4.3.3). The matrix H in Eqs. (10) and (12) consists of the rows of H defined by Eq. (8) for which the proxy data are avail able at analysis time t. When the observations are relatively accurate (R small) or sufficiently numerous (H has high row dimension), the analysis error is low, and the analysis puts variance into the solution T. When observations are poor (R large) or absent (H has low row dimension), the analysis error is large, and the analysis estimate T approaches climatology (e.g., the trivial solution T = 0). Similarly, if the proxy data do not skillfully describe climatic variance over the calibration period (H of low rank and /or R large), the analysis estimate T will have low amplitude. In either case, the analysis error covariance is formulated to be consistent with our a priori assumptions about the error in the observations, map, and model; our error covariance estimates; and the availability of proxy observations over time. Subsequent verification exercises may show that we have assumed the proxy data are too accurate or the relationship between proxy data and climate field is too well resolved. Iteration of the procedure, with different choices for the rank of H and more reasonable estimates of R, is required until a priori assumptions match a posteriori results and produce the optimal analysis field estimate T.

4.2.5. Verification

The initial verification of the reconstructed climate field is a comparison with the observed climate data withheld from the calibration procedure (Sections 4.2.1 and 4.3.3). The more complicated issue is testing the consistency of the analysis, as was mentioned earlier. Technically, the analysis solution is optimal, provided the proxy calibration holds for the entire period of reconstruction; the data and model are unbiased estimates; and the observational error € is uncorrelated in time. We may test the validity of these assumptions via cross-validation experiments with withheld data, comparison of the verification residuals with the theoretical error in the analysis, and comparison of the analysis results with other independent data (instrumental or proxy). In addition, to determine the extent to which climatic information is resolved by the proxy data, the proxy-based CFR may be compared to CFRs based on synthetic proxy data chosen to represent benchmark observations or data sets devoid of information.

4.2.6. Study of Climate Dynamics

Once the reconstruction field T has been satisfactorily validated, we may begin to study it as a source of climatic information within the context of its strengths and limitations. In many cases, the analysis error will be quite large and will limit interpretation to only the largest scale area-average indices. Such indices may be subjected to time series analysis, epoch analysis, frequency domain analysis, joint spatial-temporal mode analysis, and climate change hypothesis testing. Many such studies become more direct and convenient when the reduced space form of the solution [Eqs. (10) and (12)] is utilized.

4.3. APPLICATION: PACIFIC BASIN SST FIELD RECONSTRUCTION FROM PACIFIC AMERICAN TREE-RING INDICATORS

4.3.1. Target Climate Field

SST is an important diagnostic of the global climate. The surface ocean and atmosphere participate (or are suspected to participate) in positive and negative dynamical feedback mechanisms on a wide range of timescales (Namias, 1969, 1980; Cane, 1986; Deser et al., 1996; Latif and Barnett, 1994; Lau and Nath, 1994; Webster, 1994; Trenberth and Hurrell, 1994; Clement et al., 1996; Cane et al., 1997; Clement, 1998). As a statistical field, SST varies relatively smoothly on large time and space scales, and so we hypothesize that its leading temporal and spatial covariance patterns are likely to be captured by a sparse observational network composed of proxy observations.

Our source of SST data for use as a target climate field is the recently produced analysis (Kaplan et al., 1998) of the MOHSST5 (Parker et al., 1994) product of the U.K. Meteorological Office. MOHSST5 is a 5° X 5° monthly compilation of historical (1856-1991) ship-based observations of SST, and the Kaplan et al., (1998) analysis employs the newly developed technique of the reduced space optimal smoother to objectively extract large-scale variations of SST from incomplete spatial and temporal observational coverage. Variance on small spatial scales, amounting to (0.3-0.4)2 °C2 of the intermonthly variance, which is not constrained by the observations, is filtered out by the reduction of the EOF space. The analysis can reconstruct anomalies associated with ENSO and other large-scale phenomena, including low-frequency variability (Enfield and Mestas-Nunez, 2000; for several examples and tests, see Kaplan et al., 1998). Since our purpose is the reconstruction of large-scale SST field variability, we chose this product as our basis or target climate field for our CFR experiment.

We use 1856-1990 annual means of Pacific basin SST anomaly (87.5°N to 87.5°S, 110°E to 65°W) from this analysis (hereafter termed KaSSTa) as our target climate field. The annual average runs from April of the current year through March of the following year, the natural year for much of the Pacific Ocean (Ropelew-

ski and Halpert, 1987). This choice also serves to integrate dendroclimatological anomalies observed in the Northern Hemisphere growing season (April-October) and those observed during the growing season of the Southern Hemisphere (November-March), which together provide complementary information on annually averaged SST anomaly (Villalba et al., 2000). Following Kaplan et al., (1998), these data are decomposed into a set of EOF weights and temporal loadings, and the leading 30 modes of spatial variability are used to reconstruct approximately 85% of the total variance in the annually averaged gridded 5° X 5° estimates of SST anomaly on which KaSSTa is based (Bottomley et al.,

1990). The domain, variance, and error in the analysis of this target climate field are shown in Fig. 2.

4.3.2. Proxy Data

For over a century, those in the field of dendrocli-matology have sought estimates of local and regional climate variability from measurements made on trees producing marked annual growth rings (Fritts, 1976,

1991). One of the most common measurements is the width of sequential annual rings. In an individual tree, rings indicate the passage of time, and ring-width variations reflect the combined effects of biophysical and environmental factors, including climate. While the physiological factors controlling ring-width variations within individual trees are complex and difficult to quantify (Fritts et al., 1971), the mean weather and climatic conditions at a given site should have a similar effect on all trees within a homogeneous stand. In addition, using a priori knowledge of local or regional growing conditions, the sampling site may be chosen for its expected sensitivity to a particular, growth-limiting climatic variable of interest (Fritts, 1976). Dendro-climatologists employ statistical approaches to determine the common climatic signal extricable from a population of trees; such a statistical reduction of the data to a well-dated, continuous, standardized, tree-growth index is known as standardization, and its final product is called a chronology.

Many hundreds of tree-ring-width chronologies have now been developed worldwide (ITRDB, 1998), and their relationships to climatic variables have been tested on local, regional, and global scales. In a recent application, Villalba et al. (2000) found that tree-ring-based estimates of air temperature and drought developed from Pacific American tree-ring chronologies share a common, decadal mode of time variability over the last four centuries. They showed that this variability is associated with an ENSO-like pattern of SST anomalies in the Pacific basin. With the hypothesis that this SST pattern forces variability observed in the proxy

FIGURE 2 (a) The target climate field of the annually averaged Pacific basin sea surface temperature (SST) anomaly. Contours give root-mean-squared (RMS) variance in degrees Celsius (°C). (b) Error in estimation of the field (°C). Gray areas show regions where there are insufficient data for analysis. (Data from Kaplan et al., 1998.)

FIGURE 2 (a) The target climate field of the annually averaged Pacific basin sea surface temperature (SST) anomaly. Contours give root-mean-squared (RMS) variance in degrees Celsius (°C). (b) Error in estimation of the field (°C). Gray areas show regions where there are insufficient data for analysis. (Data from Kaplan et al., 1998.)

data, they inferred increased energy associated with the decadal SST pattern during the period 1600-1850, relative to the present (Villalba et al., 2000).

Following the intriguing and encouraging results of Villalba et al. (2000), we hypothesize that large-scale SST anomalies over the Pacific basin give rise to surface air temperature, precipitation, and drought anomalies across the tropical and extratropical Americas; in turn, dendroclimatological data from affected regions may be used to reconstruct the relevant SST variability. Hence, we apply the approach detailed earlier in Section 4.2 to the problem of reconstructing Pacific basin SSTs from the small subset of dendroclimatological indicators studied by Villalba et al. (2000) (Table 1) for the interval 1001-1990. For our purposes, we term these data tree-ring indicators of climate variability, noting that the data have a diversity of sensitivities (lo cal air temperature, precipitation, or both, in the case of drought indicators) and include tree-ring-width chronologies as well as climatic indices derived from tree-ring-width chronologies.

To compare and evaluate the results, we apply the same methodology to reconstruct SST from two additional sets of indicators. The first is a set composed of instrumental climate data from the locations from which the proxy data were collected (Table 1). These are selected to mimic the expected sensitivity (to surface temperature, precipitation, and drought) of the den-droclimatological data employed. The second is a set of red noise processes with lag -1 autoregressive characteristics like those of the dendroclimatological data. These two additional benchmark and noise reconstruction experiments are expected to return estimates of the best possible reconstruction results and of skill expected by chance, respectively. In all cases, the proxy indicators have been normalized, as their original variances relative to one another may or may not have climatic significance.

4.3.3. Calibration

4.3.3.1. Construction of the Relationship between SST and Tree-Ring Indicators

Finding the appropriate mapping H from the SST to the proxy data is a somewhat subjective task that we solve in the following manner: we estimate the covari-ance matrix between the proxy data and the low-dimensional vector of amplitudes known from KaSSTa over the calibration interval 1923-90 [Eqs. (3) to (8)] and we reserve the complementary part of the KaSSTa amplitudes (1856-1922) for verification exercises and as an alternate calibration period (see Section 4.3.5. Following the work of many dendroclimatological calibration studies in general (Fritts, 1976) and employing the findings of Villalba et al. (2000) in particular, we permit the tree-ring data to provide information on SST variability through simultaneous and lag -1 year co-variances.

EOF analysis of the tree-ring data suggests that this data set has at least one and perhaps up to three significant EOFs. The first proxy EOF, which explains about 32% of the variance in the time series, is clearly distinguishable above the eigenvalue spectrum retrieved from EOF analysis of red noise time series with the autoregressive characteristics of the tree-ring data (Fig. 3). Similar decomposition of the observed data suggests that, at most, two to three significant EOFs may be retrieved from the sampled regions.

Considering these results and the work of Villalba et al. (2000), we present the analysis with only the leading

M. N. Evans, A. Kaplan, M. A. Cane, and R. Villalba TABLE 1 Tree-Ring Indicators Used in This Analysis0'6

Site name

Latitude Longitude Interval

Benchmark historical time series

 Ellsworth