## J1

where h is the bandwidth and K is the kernel function.

Consider for heuristic reasons the following primitive occurrence rate estimator. Divide the observation interval [T(1); T(n)] in two halves of equal length, H = [T(n) — T(1)]/2. Let the number of events in the first and second halve be m1 and m2, respectively. Estimate A(T) in the first halve as m1/H and in the second halve as m2/H. This estimator corresponds to a uniform kernel (K(y) = 1 for |y| < 1/2 and K(y) = 0 otherwise) with bandwidth h = H and merely two estimation time points.

Estimations of practical relevance employ therefore quasi-continuously distributed (many) estimation time points, T, a smooth kernel function, K, and a suitably selected bandwidth, h—in exact analogy to kernel density estimation and kernel smoothing (Sections 1.6 and 4.3.1; see also Diggle (1985) and Diggle and Marron (1988)). Bandwidth selection is treated in Section 6.3.2.4 and usage of a Gaussian kernel function, K(y) = (2n)-1/2exp(-y2/2), is motivated in the technical issues.

### 6.3.2.3 Boundary bias reduction

Usage of Eq. (6.32) may lead to bias in the form of underestimation of A(T) near the boundaries, T = T(1) and T = T(n), because of "missing data" outside of the observation interval. One option to reduce this bias is to let h decrease towards the boundaries, to use a "boundary kernel" (Gasser and Muller 1979). The other, adopted here, is to generate pseudodata (Cowling and Hall 1996) outside of [T(1);T(n)] and estimate A(T) using a constant bandwidth and the original data augmented by the pseudodata:

This is the equation on which the occurrence rate estimates in this chapter are based.

The original event data are {Tout(j)}™=1. Let the (left) pseudodata for T < T(1) be denoted as {T^ut(j)}™= 1 and the (right) pseudodata for T > T(n)

as {T^t(j)}m=1. Then the augmented set of event data,

I Tout (j )}m=t=m+m'+m" = {Tout(j )}m=1 U {T^j)}™; 1 U {T^ (j ) } m=1 '

is the set union of original data, left and right pseudodata.

How can the pseudodata be generated? Cowling and Hall (1996) show the equivalence of pseudodata generation and extrapolation of the empirical distribution function of {Tout(j)}1J=1 and give rules how to generate the pseudodata. Consider the left boundary, the start of the observation interval, T(1). The simplest rule is "reflection,"

Setting j = 1 gives the rightmost of the left pseudodata points. How many pseudodata points should be generated? Since the objective in this chapter is to estimate A(T) within [T(1);T(n)], pseudodata coverage of a time interval extending to, say, 3 h below T(1) is sufficient. The

"reflection" rule is analogously applied to produce right pseudodata, for T > T(n). If the objective of the analysis is forecasting, an interval extending to beyond T(n) + 3 h should be covered, that is, more pseudodata be generated. The "reflection" rule corresponds to an extrapolation of the empirical distribution function with a constant rate.

Cowling and Hall (1996) give other rules, which may be applicable when the rate, A(T), is expected to change at the boundaries. Of particular relevance for climatological applications is when T(n) is the present and a future upwards trend in climate risk may exist. We note the "two-point" rule,

TOutO") = T(1) - 9 [Tout(j/3) - T(1)] + 2 [Tout(j) - T(1)] , (6.36)

where the fractional data Tout(j/3) are determined by linear interpolation and the setting Tout(0) = T(1); analogously for the right pseudodata.

It is evident that pseudodata generation is a crucial step on the way to an improved occurrence rate estimate. As with any extrapolation method, care is required in the interpretation of the results. On the other hand, it is inevitable to make assumptions when analysing a problem. This applies not only to the statistical "extrapolability," but also to the actualism that is assumed when using physical climate models for future projections.

### 6.3.2.4 Bandwidth selection

Bandwidth (h) selection determines bias and variance properties of the occurrence rate estimator (Eq. 6.33) and is therefore the second crucial step. Brooks and Marron (1991) developed the cross-validation bandwidth selector for kernel occurrence rate estimation. This is the minimizer of

where

## Post a comment