## Si S

If S(i) is unknown and possibly time-dependent, the following iterative estimation algorithm can be applied (Algorithm 4.1). As long as S(i) is required only for weighting, this produces the correct estimators also if only the relative changes of S(i), instead of the absolute values, are estimated. Analogously, if S(i) is required only for weighting and known to be constant, then Eqs. (4.5) and (4.6) can be used with S(i) = 1, i = 1,..., n and W = n. This estimation without weighting is called ordinary least squares (OLS). For the construction of classical CIs (Section 4.1.4), however, an estimate of S(i) has to be available.

Step |
1 |
Make an initial guess, S(0)(i), of the variability. |

Step |
2 |
Estimate the regression parameters, /Jq0) and /3(0), with the guessed variability used instead of S(i) in Eqs. (4.5), (4.6) and (4.7). |

Step |
3 |
Calculate e(i) = x(i) — /0 — /31t(i),i = 1,... ,n. The e(i) are called the unweighted regression residuals. |

Step |
4 |
Obtain a new variability estimate, S(1)(i) from the residuals. This can be done either nonparametrically by smoothing (e.g., running standard deviation of e(i)) or fitting a parametric model of S(i) to e(i). |

Step |
5 |
Go to Step 2 with the new, improved variability estimate until regression estimates converge. |

Algorithm 4.1. Linear weighted least-squares regression, unknown variability.

Algorithm 4.1. Linear weighted least-squares regression, unknown variability.

4.1.1.1 Example: Arctic river runoff

The climate model run with natural forcing only (Fig. 4.1a) does not exhibit a slope significantly different from zero. (See Section 4.1.4 for the determination of regression standard errors.) The run with combined anthropogenic and natural forcing (Fig. 4.1b) displays significant upwards trends in runoff. Wu et al. (2005) conjecture that there might be a change-point at around 1965, when the slope changed.

3000

Figure 4.1. Linear regression models fitted to modelled Arctic river runoff (Fig. 1.9). a Natural forcing only; b combined anthropogenic and natural forcing. Following Wu et al. (2005), the fits (solid lines) were obtained by OLS regression using the data from (a) the whole interval 1900-1996 and (b) from two intervals, 1936-2001 and 1965-2001. The estimated regression parameters (Eqs. 4.5 and 4.6) and their standard errors (Eqs. 4.24 and 4.25) are as follows. a ft = 3068 ± 694 km3a-1, ft = 0.102 ± 0.356 km3a-2; b 1936-2001, ft = -2210 ± 1375 km3a-1, ft = 2.807 ± 0.698 km3a-2; b 1965-2001, ft = -13,977 ± 3226 km3a-1, ft = 8.734 ± 1.627 km3a-2.

Figure 4.1. Linear regression models fitted to modelled Arctic river runoff (Fig. 1.9). a Natural forcing only; b combined anthropogenic and natural forcing. Following Wu et al. (2005), the fits (solid lines) were obtained by OLS regression using the data from (a) the whole interval 1900-1996 and (b) from two intervals, 1936-2001 and 1965-2001. The estimated regression parameters (Eqs. 4.5 and 4.6) and their standard errors (Eqs. 4.24 and 4.25) are as follows. a ft = 3068 ± 694 km3a-1, ft = 0.102 ± 0.356 km3a-2; b 1936-2001, ft = -2210 ± 1375 km3a-1, ft = 2.807 ± 0.698 km3a-2; b 1965-2001, ft = -13,977 ± 3226 km3a-1, ft = 8.734 ± 1.627 km3a-2.

### 4.1.2 Generalized least-squares estimation

In a practical climatological setting, Xnoise(i) often exhibits persistence. This means more structure or information content than a purely random process has. This knowledge can be used to apply the generalized least-squares (GLS) estimation, where the following sum of squares is minimized:

Herein,

(parameter vector),

(data vector),

(time matrix)

and V is an n x n matrix, the covariance matrix. The solution is the GLS estimator, ( )

GLS has the advantage of providing smaller standard errors of regression estimators than WLS in the presence of persistence. Analogously, in the case of time-dependent S(i), the WLS estimation is preferable (Sen and Srivastava 1990) to OLS estimation. The covariance matrix has the elements r-1.

V(il,i2) = S(ii) ■ S(Ï2) ■ E [Xnoise(il) ' Xnoisefe)]

i1,i2 = 1,... ,n. Climatological practice normally requires to estimate besides the variability also the persistence (Chapter 2) to obtain the V matrix. In the case of the AR(1) persistence model for uneven spacing (Eq. 2.9), the only unknown besides S(i) required for calculating V is the persistence time, t. The estimated V matrix has then the elements

V(il, i2) = S(ii) ■ S(i2) ■ exp [-|t(ii) - t(i2)|/r/;

i1,i2 = 1,...,n, where r' is the estimated, bias-corrected persistence time (Section 2.6). For even spacing, replace the exponential expression by (a')|il-i21. (In the case of persistence models more complex than AR(1), V is calculable and, hence, GLS applicable only for evenly spaced time series.) The autocorrelation or persistence time estimation formulas (Eqs. 2.4 and 2.11) are applied to the weighted WLS regression residuals, r(i) = [x(i) - V - Mi)]/S(i)

i = 1,... ,n. Detrending by a linear regression is not the same as mean subtraction, and the bias of those autocorrelation and persistence time estimators need not follow the approximations given for mean subtraction (Section 2.6), but are unknown. However, the deviations are likely negligible compared with the other uncertainties. Also in the case of unknown persistence, an iterative procedure similar to that for WLS can be applied, which is called estimated generalized least squares (EGLS) (Sen and Srivastava 1990: Section 7.3 therein). Section 4.1.4.1 gives an EGLS procedure for the case of AR(1) persistence.

### 4.1.3 Other estimation types

Least squares (OLS, WLS, GLS) is one type of fit criterion. Another is maximum likelihood (Section 2.6, p. 58). Further criteria result from further preferences in the regression procedure. A notable choice is robustness against the influence of outlier data, Xout(i). This can be achieved by minimizing instead of the sum of squares (Eq. 4.4), the median of squares, m {[x(i) - fto - ft1t(i)]2 /S(i)2 }J=i. (4.17)

Preferably (background material) is to minimize the trimmed sum of squares, n-j

SSQT(fto,ft1 )= E [x'(i) - fto - ft1t'(i)] /S'(i)2 , (4.18) i=j+1

where j = /NT(5n), /NT(■) is the integer function, 0 < 5 < 0.5, x'(i) is size-sorted x(i), and t'(i) and S'(i) are the "slaves," correspondingly rearranged. Trimming excludes the 2j most extreme terms from contributing to the estimation. Also by the minimization of the sum of absolute deviations, n

SSQA(fto,ft1) = E \x(i) - fto - ft1t(i)| /S(i) , (4.19)

outlier values (if not already excluded by means of a prior analysis) can be given less influence on regression estimates than in least-squares minimization. Such criteria could also be preferable (in terms of, say, standard errors of estimates) to least squares when instead of Xout(i) we considered heavy-tailed or skewed Xnoise(i) distributions.

The various criteria introduced so far and the related minimization techniques represent the computational aspect of the regression estimation problem. The second and perhaps more relevant aspect is suitability of the linear regression model. In climatology this means whether a linear increase or decrease is not too simple for describing Xtrend(T). Model suitability can be evaluated graphically via various types of plots of the regression residuals (Eq. 4.16). These realizations of the noise process should nominally not exhibit more structure than the assumed persistence model.

4.1.4 Classical confidence intervals

Assume that for a data set {t(i),x(i)}rn=1 the following assumptions hold:

1. Xnoise(i) is of Gaussian shape;

2. the covariance matrix V (Eq. 4.14), containing persistence and variability properties, is correctly estimated;

Then CIs for the GLS estimators (30 and '/31 of Eq. (4.13) can be constructed from their Student's t distributions (Sen and Srivastava 1990):

CI0j ,1-2a f3j + tn—2(a) ■ se^.; fa + tn-2(1 - a) ■ se^. , (4.20)

j = 0 (intercept) and 1 (slope). The standard errors of the estimators are (Sen and Srivastava 1990)

## Post a comment