## Stochastic Simulation Of Precipitation And Streamflow Processes

JOSÉ D. SALAS, JORGE A. RAMIREZ, PAOLO BURLANDO, AND ROGER A. PIELKE, Sr.

Stochastic simulations of hydroclimatic processes such as precipitation and stream-flow have become standard tools for analyzing many water-related problems. Simulation signifies "mimicking" the behavior of the underlying process so that realistic representations of it can be made. For this purpose a number of empirical, mathematically/physically based, mathematically/stochastically based, analog/physically based, and physical/laboratory-scale based models and approaches have been proposed and developed in the literature. This chapter emphasizes simulation based on stochastic and probabilistic techniques. Also, the emphasis will be on precipitation and streamflow processes, although many of the methods and models included herein are equally applicable for other hydroclimatic processes as well such as évapotranspiration, soil moisture, surface and groundwater levels, and sea surface temperature.

Stochastic simulation enables one to obtain equally likely sequences of hydroclimatic processes that may occur in the future. They are useful for many water resources problems such as (a) estimating the design capacity of a reservoir system under uncertain streamflows, (b) evaluating the performance of a water resources system in meeting projected water demands under uncertain system's inputs, (c) estimating drought properties, such as drought length and magnitude based on simulated streamflows at key points in the water supply system under consideration, (d) deriving the distribution of the underlying output variable of a groundwater flow equation (e.g., the hydraulic head), given the distribution of the parameters (e.g., the hydraulic conductivity) and boundary conditions, (e) establish-

Handbook of Weather, Climate, and Water: Atmospheric Chemistry, Hydrology, and Societal Impacts, Edited by Thomas D. Potter and Bradley R. Colman. ISBN 0-471-21489-2 © 2003 John Wiley & Sons, Inc.

ing the uncertainty in travel time and spread of pollutants in porous media as a function of the uncertainty in the parameters of the groundwater contamination transport model, and (f) analyzing the impacts of large-scale climate variability and global climate change on water supply availability and the ensuing planning and operation of water resources projects.

1 STOCHASTIC SIMULATION OF PRECIPITATION Continuous-Time Precipitation

The theory of point processes has been applied for modeling continuous-time precipitation since Le Cam60 suggested that a Poisson process could model the occurrence of rainfall showers. Let us assume that the number of storms N(t) in a time interval (0, t) arriving to a given point is Poisson distributed with parameter It (I = storm arrival rate.) Referring to Figure 1(a), n storms arrived in the interval (0, t) at times tx,... ,t„. The number of storms in any time interval T is also Poisson distributed with parameter kT. Assume further that the rainfall amount R associated with a storm arrival is white noise (e.g., R may be gamma distributed) and that N(t) and R are independent. Thus, rainfall amounts rx,...,rn correspond to storms occurring at times tx,...,tn. Such a rainfall generating process has been called Poisson white noise (PWN).

The cumulative rainfall in the interval (0,t), Z(t) = J^'i' Rj is a compound Poisson process. Also the cumulative rainfall over successive nonoverlapping time intervals T, i.e., the discrete-time rainfall process (refer to Fig. 1), is given by Yl — Z(iT) — Z(iT — T). i = 1,2, The basic statistical properties of Yt assuming that Z(t) is generated by a PWN model has been widely studied.11'20 Its autocorrelation function pk(Y) is equal to zero for all lags greater than zero, which contradicts actual observations [e.g., px(Y) = 0.446 for hourly precipitation at Denver Airport station for the month of June based on the 1948-1983 records.] Despite this shortcoming, the PWN model can be useful for predicting annual precipitation20 and extreme precipitation events.10 Instead of assuming that rainfall occurs instantaneously with zero duration one may consider rainfall with random duration D and intensity I, as shown in Figure 1 b. This is called the Poisson rectangular pulse (PRP) model.86 A common assumption is that D and / are independent and exponentially distributed. Figure 1 b shows a PRP process with n storms in the interval (0, t) occurring at times t{,... ,tn with associated intensities and durations

(/',, dx) (/„, dn). Then, storms may overlap and the aggregated process Yt becomes autocorrelated. Although the PRP model is better conceptualized than the PWN, it is still limited when applied to rainfall data.86 Thus, alternative models based on the concept of clusters have been suggested.

Neyman and Scott69 in modeling the spatial distribution of galaxies originally suggested the concept of clusters. Le Cam60, Kavvas and Delleur,51 and others30'82'86"88 applied this concept of space clustering to model continuous-time rainfall. The Neyman—Scott cluster process can be described as a two-level mechan-

with the storm that arrived at time tj. In addition, the time of occurrence of bursts, t, relative to the storm origin i- may be assumed to be exponentially distributed with parameter ft (e.g., in Fig.lc the three bursts arising from the first storm are located at times t, ,, t, 2, and tl3 relative to ?,). Then, if the precipitation burst is described by an instantaneous random precipitation depth R, the resulting precipitation process is known as Neyman Scott white noise (NSWN) while if the precipitation burst is a rectangular pulse the precipitation process is known as Neyman-Scott rectangular pulse (NSRP).

Estimation of parameters for Neyman-Scott (NS) models has been a major subject of research in the past two decades.9'22'30'51'71 The usual estimation has been based on the method of moments, although other approaches have been suggested.29'51'82 An apparent major estimation problem is that parameters estimated based on data for one level of aggregation, say hourly, may be significantly different from those estimated from data for another level of aggregation, say daily.11'30'71'86 The problem seems to be that as data are aggregated, information is lost and corresponding second-order statistics do not have enough information to give reliable estimates of the parameters of the generating process (model), and, as a consequence, they become significantly biased with large variance. For example, extensive simulation studies were carried out by Cadavid et al.11 based on the NSWN model with (known) population parameters: 1 = 0.102 x 10_3/min, jS = 0.00221/min, fx = 24.36/in, and l/v = 0.072 (parameter of the geometric distribution for the cluster size). Hourly and daily series were used to estimate moments (mean, standard deviation, and lag-1 and lag-2 correlation coefficients) from which the parameters were estimated. The results are shown in Table 1. Clearly, despite that the generating mechanism is known (the NSWN), less reliable estimates of parameters are obtained when daily values are used. Estimation based on weighted moments of various time scales in a least-squares fashion is an alternative.10'22 Also, physical considerations may be useful in setting up constraints in some of the parameters, initializing the estimates to be determined based on statistical considerations, and for comparing the fitted model parameters with some known physical properties.17 Koepsell and Valdes54 applied these concepts using the space-time cluster model suggested by Waymire et al.116 for modeling rainfall in Texas and pointed out the difficulty in estimating the parameters even when using physical considerations.

Besides the class of Poisson processes and Neyman-Scott cluster processes, other types of temporal precipitation models have been suggested such as those based on Cox processes,103 renewal processes,7'31 and Barlett-Lewis processes.40'87 Likewise, alternative space-time multidimensional precipitation models have been developed (e.g., Smith and Krajewski104). In addition, all precipitation models based on point and cluster processes proposed up to date are limited in some respects; e.g., they do not include the daily periodicity observed in actual convective rainfall processes.49'71 Furthermore, Rodriguez-Iturbe et al.89 and others raised the issue that nonlinear dynamics and chaos may be useful approaches for certain hydro-meteorological processes such as rainfall. Finally, excellent reviews of the state of the art in the field have been made32 and a number of studies pertaining to rainfall analysis, modeling, and predictability have been compiled in special issues of some

Parameter (units) |
Population Value |
Estimated from Hourly Data" |
Estimated from Daily Data" |

À x 103 (1/min) |
0.102 |
0.103 |
0.091 |

ß x 103 (1/min) |
2.210 |
2.300 |
1.630 |

Ai(l/in) |
24.360 |
23.990 |
7.010 |

1/v |
0.072 |
0.072 |
0.247 |

"Estimates based on 12 series of size 36,456 for hourly and 12 series of size 1519 for daily. From Cadavid et al."

"Estimates based on 12 series of size 36,456 for hourly and 12 series of size 1519 for daily. From Cadavid et al."

journals (e.g., J. Appl. Meteor., vol. 32, 1993; J. Geoph. Res., vol. 104, no. D24, 1999).

Hourly, Daily, and Weekly Precipitation

We have seen in the previous section that the models and properties for cumulative precipitation over successive nonoverlapping time periods, i.e., discrete-time precipitation, can be derived from continuous-time precipitation models. However, one may formulate precipitation models directly at hourly, daily, and weekly time scales. In these cases, the theory of Markov chains has been widely used in the literature for simulating not only precipitation (in discrete time) but many other hydrologic processes such as streamflow, soil moisture, temperature, solar radiation, and

7 n 49 s-i an water storage in reservoirs. • • • •

Consider that X(t) is a discrete valued process that started at time 0 and developed through time, i.e., i = 0, 1,2, Then P[X(t) = xt\X(0) = x0, X(\) =x1,..., X(t — 1) = x,_ j ] is the probability that the process X(t) — xt given its entire history. If this probability simplifies to P[X(t) = x, | X(t — 1) = xt_, ], the process is a firstorder Markov chain or a simple Markov chain. Because X(t) is a discrete valued process, we will use the notation X(t) = /', / = 1, ... ,r instead oiX(t) = x„ where j represents a state and r is the number of states; e.g., in modeling daily rainfall one may consider r = 2 with j =1 for a dry day (no rain) and j = 2 for a wet day. A simple Markov chain is defined by its transition probability matrix P(t), a square matrix with elements p^t) = P[X(t) — j\X(t — 1) = /] for all ij pairs. Furthermore, qj{t) = P[X(t) = /], / = 1,..., r, is the marginal probability distribution of the chain being at any state j at time t and #-(0) is the distribution of the initial states. Moreover, if P(t) does not depend on time, the Markov chain is a homogeneous or stationary chain and, in this case, the notations P and py are used. The estimation of some probabilities that are useful for simulation and forecasting of precipitation events are the «-step transition probability p\j\ the marginal distribution qj(t) given the distribution qj(0), and the steady-state probability vector q*. These probabilities can be determined from well-known relations available in the literature.39'118

Estimation for a simple Markov chain amounts to estimating the elements p¡j of the transition probability matrix. Common estimation methods include the method of moments and maximum likelihood.39 To test whether a simple Markov chain is an adequate model for the process under consideration, one can check some of the assumptions of the model and see whether some relevant properties of the precipitation process are reproduced (e.g., compare the probability pff with that obtained from the observed data, Furthermore, the Akaike information criterion has been helpful in selecting the order of Markov chain models.15'48

Although in some cases simple Markov chains may be adequate for representing the variability of precipitation, often more complex models may be necessary. For instance, in modeling daily rainfall processes throughout the year, the parameters of the Markov chain may vary with time (e.g., for a two-state Markov chain, the transition probabilities p¡j may vary along the year and the estimates can be fitted with trigonometric series to smooth out sample variations90). Higher order Markov chains may be necessary in other cases. Chin15 analyzed daily precipitation records of more than 100 stations across the continental United States and concluded that generally second- and third-order models were preferred for the winter months while the first-order model was better for the summer months. In addition, maximum likelihood for estimating Fourier series coefficients for alternating renewal processes and Markov chains for daily rainfall90 and mixed models with periodic Markov chains for hourly rainfall (to account for the effect of daily periodicity) have been suggested.49

Monthly, Seasonal, and Annual Precipitation

Modeling of precipitation for long time scales such as monthly is generally simpler than for short time scales such as daily, especially because for long time scales the autocorrelation becomes smaller or negligible (except in cases of low frequency21). In such cases modeling precipitation at a given site amounts to finding the probability distribution for each month. Generally different distributions will be needed for each month. On the other hand, seasonal precipitation data in semiarid and arid regions may include zero values for some seasons, hence the precipitation is a mixed random variable. Let Xvx = precipitation for year v and season z, and define PT(0) = P(XV r = 0), z=l,...,co (co = number of seasons per year). Then, FXx(x) = Pt(0) + [1 — /3r((J)]FVTiVT>0(a) is the cumulative distribution function for season z, in which FXx{x) — PAX < a) and Fyt|\v>o<a") = P(X < x\X > 0). Thus, prediction of seasonal precipitation requires estimating Px(0) and FVi|vt>0(a). Several distributions such as the log-normal and log-Pearson have been used for fitting the empirical distribution of seasonal precipitation. For modeling precipitation at several sites, one must consider the intersite cross correlations and the marginal distribution (at each site). For continuous random precipitation, a common modeling approach has been to transform them into normal, then use a lag-0 multivariate model for modeling the transformed precipitation (an approach similar to modeling streamfiow as in Section 2). Modeling of annual precipitation is similar to modeling seasonal precipitation, i.e., determining either the marginal distribution Fx(x) or the conditional distribution Fx^x>0(x), depending on the particular case at hand. Likewise, modeling of annual precipitation at several sites is generally based on transforming the data into normal and using a multivariate normal model.

### 2 STOCHASTIC SIMULATION OF STREAMFLOW

If one can develop a stochastic model for streamflow in continuous time, then, in principle, the properties and the models for daily, monthly, and annual streamflow can be obtained. Some attempts have been made for developing models of stream-flow processes in continuous time based on physical principles.20 However, the models of aggregated flows that can be derived from such continuous-time models, become mathematically cumbersome and of limited applicability for operational hydrology.53 Understanding the rules for upscaling the models and parameters has been a challenging subject for research. Generally most of the models that are available for streamflow simulation in continuous time and short time scales, such as hourly, are based on the transformation of precipitation into runoff by means of physical or conceptual principles. Thus the stochastic characteristics of the precipitation input and of the other relevant processes of the hydrologic cycle of the watershed are transferred into a stochastic streamflow output. Examples of models in this category are represented by SHETRAN26 and PRMS.59 SHETRAN simulates "continuous" streamflows along the river network by solving partial differential equations of the physical processes involved while PRMS is a semidistributed conceptual model that simulates hourly and daily streamflows. However, in this section we are mainly concerned with stochastic streamflow models that can be derived explicitly from the physically or conceptually based relations of the underlying hydrologic process of the watershed or directly from the streamflow data.

Continuous Time to Hourly and Daily Streamflow Simulation

The simulation of streamflow on a continuous time scale requires the formulation of a model structure that is capable of reproducing the streamflow fluctuations on a wide dynamical range. As already mentioned, the application of stochastic approaches to continuous time and short time scale streamflow modeling has been limited because of the complex nonlinear relations that characterize the precipita-tion-streamflow processes at those temporal scales. The early attempts to model hourly and daily streamflows were based on using autoregressive (AR) models after standardization and transformation. However, stochastic models essentially based on process persistence do not properly account the rising limb and recession characteristics that are typical of hourly and daily flow hydrographs. Also shot noise or Markov processes and transfer function models have been proposed for daily flow simulation with some limited success in reproducing the rising limb and recessions.110

Nevertheless, interesting work has been done with some success by using conceptual-stochastic models. For instance, Kelman52 applied a conceptual representation of a watershed considering the effects of direct runoff and surface and groundwater storages. Direct runoff is modeled by a PAR( l ) model with an indicator function to produce intermittence and the other components are modeled using linear reservoirs. Kelman's model produced reasonable results for generating daily flows for the Powell River, Tennessee. Also following the approach suggested by Salas and Obeysekera96 and Claps et al.,16 Murrone et al.68 proposed a conceptual-stochastic model for short time runoff. A three-level conceptual runoff component and a stochastic surface runoff model the daily response of the watershed. The base flow is modeled by three linear reservoirs that represent the contribution of deep aquifers with over-year response, aquifers with annual renewal, and subsurface runoff. The surface runoff is regarded as an uncorrelated point process. Modeling rainfall as an independent Poisson process, the above scheme leads to a multiple shot noise streamflow process. The model is effective in reproducing streamflow variability. In addition, intermittent daily streamflow processes have been modeled25 by combining Kelman's conceptual approach with product models94 and gamma AR models28 and by using a three-state Markov chain describing the onset of streamflow and an exponential decay of streamflow recession.1

Weekly, Monthly, and Seasonal Streamflow

Single-Site Periodic Models. Stationary stochastic models can be applied for modeling weekly, monthly, and seasonal streamflows after seasonal standardization. This approach may be useful when the season-to-season correlations do not vary throughout the year. In general though, models with periodic correlation structure, such as periodic autoregressive (PAR) and periodic autoregressive and moving average (PARMA) are more applicable.29'92 An example is the PARMA(1,1) model97

yv,r =Hx + «HrOv.t-l - Hz-l) + £v,r ~ 01,t£v,t-1 0)

where ¿uT, (¡>]z, T, and a As:) are the model parameters. When the O's are zeros, model (1) becomes the PARMA(1,0) or PAR(l) model. Low-order PARMA models such as PARMA(1,0) and PARMA(1,1) have been widely used for simulating monthly and weekly flows.3'I8'43'84'92'120

PARMA models can be derived from physical/conceptual principles. Considering all hydrologie processes and parameters in the watershed varying along the year, it has been shown that seasonal streamflow falls within the family of PARMA models.96 Alternatively, a constant parameter model with periodic independent residuals was suggested.16 One of the desirable properties of stochastic models of seasonal streamflows is the preservation of seasonal and annual statistics. However, such dual preservation of statistics has been difficult to get with simple models such as the PAR(l) or PAR(2). For this reason in the 1970s hydrologists turned to the so-called disaggregation models (refer to Section 3). The major drawback of such simple PAR models to reproduce seasonal and annual statistics has been the lack of sufficient correlation structure. PARMA models having more flexible correlation structure than PAR models offer the possibility of preserving seasonal and annual statistics. Some hydrologists have argued that PARMA models have too many parameters. Yet, one cannot hope for models such as the PAR(1) to do more than it can, i.e., to reproduce simply the lag-1 month-to-month correlations while failing to reproduce correlations for longer time lags and statistics at higher orders of aggregation. An alternative for reproducing both seasonal and annual statistics is the family of multiplicative models.

Box and Jenkins5 first suggested multiplicative models. These models have the characteristic of linking the variable yV T with yvz_i and yv_l McKerchar and Delleur66 used multiplicative models after differencing the logarithms of the original series for simulating and forecasting monthly streamflow series. Because such multiplicative models do not take into account periodic correlations, differencing was used in an attempt to decrease or eliminate such periodicity. However, they were not able to reproduce the seasonality in the covariance structure and could not establish confidence limits of forecasts with consideration of seasonality. This problem arises because the referred multiplicative model does not include periodic parameters. A model (with periodic parameters) that can overcome the limitations mentioned above is the multiplicative PARMA model.95 For instance, the multiplicative PARMA( 1,1) x ( 1.1 )„, model is written as zv.t = ^l.^v-l.T + 4>\,xZv,z-\ - + fiv.T ^

in which zV T — yVtX — ¿uT and Oj T, @1t, <f>l z, 9l z, and aT(e) are the model parameters. This model has been applied successfully for simulating the Nile River flows.

A limitation of the foregoing PARMA and multiplicative PARMA models for modeling hydrological time series is the requirement that the underlying series be transformed into normal. An alternative that does not have this requirement is the PGAR(l) model for modeling seasonal flows with periodic correlation structure and periodic gamma marginal distribution.27 Consider that y is a periodic correlated variable with a three-parameter gamma marginal distribution with location Az, scale aT, and shape [iT parameters varying with t, and t = 1 <» (T = number of seasons). Then, the new variable zv T = yV T — A is a two-parameter gamma that can be represented by zV T = ^>TzV T_1 +(zV T_1) Twv r where (j), = periodic autoregressive coefficient, ôz = periodic autoregressive exponent, and wv T = noise process. This model has a periodic correlation structure equivalent to that of the PAR(l) process. It has been applied to weekly streamflow series for several rivers in the United States.27 Results obtained indicated that such PGAR model compares favorably with respect to the normal based models (such as the PAR model after logarithmic transformation) in reproducing the basic statistics usually considered for streamflow simulation. Furthermore, a nonparametric approach for streamflow simulation that is capable of reproducing closely historical distributions has been proposed.100

PARMA and PGAR models are less useful for modeling flows in ephemeral streams. In these streams the flows are intermittent, a characteristic that is not represented by the above mentioned models. Instead periodic product models such are more realistic,94 where Bv T is a periodic correlated Bernoulli asy

(l ,0) process, zv T may be either an uncorrected or correlated periodic process with a given marginal distribution, and B and z are mutually uncorrected. Properties and applications of these models for simulating intermittent monthly flows of some ephemeral streams have been reported in the literature.14'94

Multisite Periodic Models. In modeling seasonal streamflows at several sites, multivariate PAR and PARMA models are generally used.6'42'92,97 For example, the multivariate PARMA(1,1) model is

in which Zv T = Yv z — |x ; |x is a column parameter vector with elements yUt'\ ..., fi"'■ S>T and 0T are n x n periodic parameter matrices, the noise term ev T is a column vector normally distributed with £(eV I) = 0, £(ev Te^T) = rT, and E(£v x_k) — 0 for k 0, and n = number of sites. In addition, it is assumed that ev T is uncorrected with Zv t_1. Parameter estimation of this model can be made by the method of moments, although the solution is not straightforward. Dropping the moving average term in (3), i.e., 0t = 0 for all t's, yields a simpler multivariate PARMA(1,0) or PAR(l) model. This simpler model has been widely used for generating seasonal hydrologic processes. Further simplifications of the foregoing models can be made to facilitate parameter estimation. Assuming that <J>T and ©T of Eq. (3) are diagonal matrices, the multivariate PARMA(1,1) model can be decoupled into univariate models for each site. To maintain the cross correlation among sites ev T is modeled as ev T = Bx£ where E(£ f )= I and E(£ ,) = 0 for k ^ 0. This modeling scheme is a contemporaneous PARMA(1,1), or CPARMA(1,1), model. Useful references on this type of models are available in the literature.42,84,92,97

### Annual Streamflows

Autoregressive (AR) and autoregressive and moving average (ARMA) models have been the most popular models for single site and multisite annual streamflow simulation. Specifically, low-order models have been widely applied for generating annual flow series.29,42,61,62,72,92

Single-Site stationary Models. The AR(1) model is defined as y, = n + 4>(y,-\ — n) + '<■,■ Its autocorrelation function pk = (f>pk , = 4>k decays exponentially as the time lag k increases. This model has been a prototype of short memory models because pk goes to zero relatively fast and as a result h —*■ \ rather quickly in E(R**) ~ nh (R*„* = rescaled range of cumulative departures from the sample mean). A more versatile model than the AR(1) is the ARMA(1,1) given by6,42,97

Its autocorrelation function pk = (l - - 0)(l - 200 + d2)'1^'1 is more flexible than that of the AR(1) model because it depends on the two parameters (j) and 9. The ARMA process can represent long memory dependence,72'9' a property that is important for many rivers. AR and ARMA models assume that the underlying series is normally distributed, an assumption that is not always applicable for annual streamflow series. While one can circumvent this assumption by transforming the skewed series into an approximately normal series, a direct approach that does not require a transformation is a viable alternative. The gamma autoregressive (GAR) process y, = — (j)) + <$>yt_\ + t], offers such an alternative where y, is gamma distributed with parameters X, a, and ft (the location, scale, and shape parameters, respectively), (j) = autoregressive coefficient, and t], = noise term. The GAR(l) model has the same autocorrelation function as that of an AR(1) model. Estimation procedures and applications of the GAR model for simulating annual streamflow series can be found in the literature.28

AR, ARMA, and GAR models are useful for modeling streamflow processes in perennial rivers, yet they are inadequate for intermittent processes such as stream-flows in some ephemeral streams. Intermittent processes can be modeled as y, = B,z, where yt = non-negative intermittent variable, Bt = dependent (1,0) Bernoulli process, z, = positive valued continuous autocorrelated variable, for instance, an AR(1) process, and Bt and z, are assumed to be mutually uncorrelated. Thus, the resulting product process yt is intermittent and autoregressive. These models have been applied for modeling short-term rainfall and intermittent flow processes.7'13'94 Finally, other type of models, such as fractional Gaussian noise.64 broken line,6 shifting level,93 and FARMA42 have been proposed for representing certain special properties of annual streamflow time series. For example, the shifting level model has the capability of simulating time series with sudden changes, a property that has been observed in many hydroclimatic processes.

Multisite Stationary Models. Modeling of multiple time series is widely needed in hydrology. Consider the column vector Yt with elements y\V],..., vj"' in which n = the number of series (number of variables) under consideration. The multivariate AR(1) model is defined as65

in which Zt = Yt — (x, |x is a column vector of means ¿u(1),..., e, is a column vector of normal noises '<{1', ..., , each with zero mean such that E(eteJ) — F and E(e,eJ_ k) — 0 for k 0, and $ and F are n x n parameter matrices. In addition, it is assumed that et is uncorrelated with 2t_x. Model (5) is a prototype of short-memory models for multiple series and has been widely used in operational hydrology.29'42'62'92 Likewise, the multivariate ARMA(1,1) model can be written as in Eq. (3) except that the parameters $ and & do not depend on time.

Except for low-order multivariate AR models, using the full multivariate ARMA models often leads to complex parameter estimation.73'92 Thus, model simplifications have been suggested. For instance, a contemporaneous ARMA (CARMA)

model results if <1> and 0 are diagonal matrices. This concept, which has been advocated by Salas et al.,92 Stedinger et al.,106 and Hipel and McLeod42 can be extended to the general case. A contemporaneous relationship implies that only the dependence of concurrent values of the y's are considered important. Furthermore, the diagonalization of the parameter matrices allows "model decoupling" into component univariate models so that the model parameters do not have to be estimated jointly, and univariate modeling procedures can be employed. Thus, univariate ARMA(p, q) models are fitted at each site where each r},'\ / = 1is uncorrected, but are contemporaneously correlated with a variance-covariance matrix T. Thus, the parameters, </>'s and 0's in each model, can be estimated by using univariate estimation procedures and the e's can be modeled by e, = Bq, in which £ is normal with E(| = / and ) = 0 for k ^ 0. Note that one does not have to con sider the same univariate ARMA(/;, q) model for each site.

### 3 TEMPORAL AND SPATIAL DISAGGREGATION MODELS

Disaggregation models, i.e., downscaling models in time and/or space, have been an important part of stochastic hydrology, not only because of our scientific interest in understanding and describing the characteristics of the spatial and temporal variability of hydrological processes, but also because of practical engineering applications. For example, many hydrologic design and operational problems require hourly precipitation data. Because hourly precipitation data are not as commonly available as daily data, a typical problem has been to downscale or disaggregate daily data into hourly data. Similarly, for simplifying the analysis and modeling of large-scale systems involving a large number of precipitation and streamflow stations, temporal and spatial disaggregation procedures are needed. This section briefly reviews some empirical and mathematical models and procedures for temporal and spatial disaggregation of precipitation and streamflow.

### Disaggregation of Precipitation

Generally the disaggregation of station precipitation data defined at a given time interval into precipitation for smaller time intervals has been done empirically.74 For instance, by using either tables or graphs, one can do disaggregation of 24-h (daily) precipitation into 6-h precipitation. More complete disaggregation schemes has been developed.41'119. Hershenhorn and Woolhiser41 considered daily rainfall amounts and a model to obtain within-the-day magnitudes for the number of storms, amount, duration, and arrival time for each storm. They indicated that simulated rainfall sequences compared well with observed values. Although the foregoing models are innovative, they are not satisfactory, i.e., they are complex and require many transformations of the original data to obtain reasonable results. Another shortcoming is the lack of flexibility in the number of intervals considered.

Another formal disaggregating scheme for short-term rainfall was developed by Cadavid et al.11 Disaggregation models were developed assuming PWN and NSWN

(refer to Section 1) as the underlying rainfall-generating mechanisms. Formulation of the disaggregation algorithm for the PWN model is based on the distribution of the number of arrivals N conditioned on the total precipitation Y in the time interval, the distribution of the white noise term given N and Y, and the distribution of the arrival times conditional on N. The algorithm performs well when using simulated PWN samples. The disaggregation scheme based on the NSWN model is more complex. It performs well on simulated and recorded samples provided that the model parameters used are similar to those controlling the process at the disaggregation scale. The main shortcoming is the incompatibility of parameter estimates at different aggregation levels as pointed out in Section 1. Recently, a rainfall disaggregation based on artificial neural networks has been suggested.8

Epstein and Ramirez24 developed a multiscale, linear regression, statistical climate inversion scheme based on the disaggregation model of Valencia and Schaake111 given by

where Y is a matrix of downscaled hydroclimatic values (e.g., precipitation), X is a matrix of upscaled hydroclimatic values, A and B are parameter matrices, and s is a matrix of independent standard normal deviates. All terms in the above equation are functions of time, and the downscaling model is conditioned on time through the temporal evolution of the large-scale field, X. Parameter estimation, based on the method of moments, leads to the preservation of the first- and second-order moments at all levels of aggregation.

### Disaggregation of Streamflow Data

The shortcoming of low-order PAR models when applied for simulating seasonal flows in reproducing the annual flow statistics led to the development of disaggregation models such as the Valencia-Schaake model (6). In this model, the modeling and simulation of seasonal flows is accomplished in two or more steps. First the annual flows are modeled so as to reproduce the desired annual statistics [e.g., based on the ARMA(1,1) model]; then synthetic annual flows are generated, which in turn are disaggregated into the seasonal flows by means of Eq. (6). While the variance-covariance properties of the seasonal flow data are preserved and the generated seasonal flows add up to the annual flows, model (6) does not preserve the covar-iances of the first season of a year and any preceding season. To circumvent this shortcoming, Eq. (6) has been modified as i = AX + Be + CZ, where C is an additional parameter matrix and Z is a vector of seasonal values from the previous year (usually only the last season of the previous year) for each site.6 Further refinements and corrections assuming an annual model that reproduces S%x and Sxz has been suggested57 as well as a scheme that does not depend on the annual model's structure yet reproduces the moments Syy, SYX, and Sxx.'05

The foregoing disaggregation models have too many parameters, a problem that may be significant especially when the number of sites is large and the available historical sample size is small. Lane56 sets to zero some of the parameters in the above disaggregation model so that

is a model with fewer parameters. Parameter estimation and appropriate adjustments so that the seasonal values add exactly to the annual values at each site can be found in the literature.56'97

The estimation problem can be simplified if the disaggregation is done in steps (stages or cascades) so that the size of the matrices involved and consequently the number of parameters decrease.6 For instance, annual flows can be disaggregated into monthly flows directly in one step (this is the usual approach), or they can be disaggregated in two or more steps, e.g., into quarterly flows in a first step; then each quarterly flow is further disaggregated into monthly flows in a second step. However, even in the latter approach, considerable size of the matrices will result when the number of seasons and the number of sites are large. Santos and Salas98 proposed a stepwise disaggregation scheme in such a way that at each step the disaggregation is always made into two parts or two seasons. This scheme leads to a maximum parameter matrix size of 2 x 2 for single-site disaggregation and 2n x 2n for multisite. Disaggregation models that reproduce seasonal statistics and the covariance of seasonal flows with annual flows assuming log-normal seasonal and annual flows have been also suggested.36'107 Also temporal disaggregation based on nonpara-metric procedures has been proposed.

Although disaggregation has been a major development and a practical tool in stochastic hydrology, still the question of why certain periodic models fail to reproduce annual statistics remained. Thus, more complex models such as PARMA models were suggested and developed in the 1970s and the early 1980s. Their capabilities for reproducing statistical properties beyond the seasons have been explored by Obeysekera and Salas70 and Bartolini and Salas.3

### 4 TEMPORAL AND SPATIAL AGGREGATION MODELS

As in disaggregation, the aggregation (upscaling) modeling approach deals with streamflow processes at two or more levels of aggregation or time scales. However, the two concepts are quite different. In disaggregation, the modeling and generation procedure is backward in the sense that one models and generates annual flows first, then monthly, weekly, and daily flows are obtained in successive disaggregation steps. On the other hand, in temporal aggregation, the procedure is forward, i.e., one models and generates daily flows first and then successively weekly, monthly, and annual flows are modeled and generated. The basic premise of the aggregation approach is that the stochastic characteristics at the continuous time scale dictate those at any level of aggregation or time scale. The relationship between statistics at various time scales has been explored in the literature.50

The aggregation modeling approach for streamflow processes was developed by Vecchia et al.112 Assuming that monthly flows follow a PAR(l) or PARMA(1,1) process, it was shown that the resulting model for the annual flows is the stationary ARMA(1,1). The foregoing concepts and results brought into light the structural linkage and compatibility between streamflow models (and their parameters) of various time scales. Streamflow data of the Niger River at Kaulikoro, Africa, were used to illustrate some of the aggregation concepts, especially in relation to reproducing the annual correlation structure when the seasonal flows were modeled by the PAR(l) and PARMA(1,1) models.70 It was shown that in comparing the parameters and the correlograms of the ARMA(1,1) models of annual flows derived from the models of seasonal flows, the results obtained from the PARMA(1,1) model were significantly better than those obtained from the PAR(l). In addition, the results obtained vary depending on the number of seasons considered in the year (e.g., monthly, quarterly), and better results were obtained, as the number of seasons in the year became smaller.

Bartolini and Salas3 extended the concept of aggregation to include aggregation not only from seasons to a year but from weeks to months, months to seasons, and seasons to years. For instance, the aggregation of a PARMA(2,1) monthly flows leads to a PARMA(2,2) bimonthly flows; in turn the aggregation of such PARMA(2,2) bimonthly flows gives also a PARMA(2,2) for the quarterly flows. Furthermore, if such quarterly flows are aggregated into annual flows, then the model is the stationary ARMA(2,2). The partial aggregation concepts have been applied to the Niger River seasonal and annual flows, and the results showed the superiority of the PARMA(2,1) and PARMA(2,2) models relative to the other models tested in reproducing the variance-covariance properties of the annual flows. The application of the aggregation concepts for modeling the seasonal and annual flows of the Niger River at various time scales suggest the need for using PARMA models for streamflow modeling and simulation if one would like to reproduce seasonal and annual first- and second-order statistics. The traditional models such as the PAR(l) simply are deficient for modeling flows such as those of the Niger. The usual approach to model seasonal and annual flows has been to use different models for different time scales, disregarding the compatibility among models at various time scales. The aggregation concepts and results discussed in this section point out that such traditional approaches and models for streamflow simulation must be avoided.

Similar reasoning as in temporal aggregation applies for spatial aggregation of streamflow processes. For instance, one can assume a stream network composed of a number of first-, second-, and third-order streams. Consider first modeling the flows at the junction of two first-order streams. Naturally, the model at a site immediately downstream from the junction must be derived from the bivariate model defined at two upstream sites, one in each of the tributaries. In turn, the model of the flows of the second-order stream as it joins the flows of another, say, second-order stream must define the model of the flows immediately downstream from the junction, and so on as the streamflow travels down the stream network. Thus, streamflow models must be compatible in both temporal and spatial scales.

### 5 SCALING ISSUES AND DOWNSCALING

Understanding, describing, and modeling local, regional, and global climate and its nonlinear interactions with hydrological, biophysical, and biogeochemical processes are currently some of the most challenging problems in the geosciences. It is not only the high spatial and temporal variability of the governing processes and boundary conditions but the wide range of scales over which this variability occurs. Distributed hydrologic models require high-resolution input data. Among all hydro-logic variables, precipitation is of paramount importance in the water and energy budgets at the land surface-atmosphere interface, and its accurate representation in hydrologic and atmospheric models is critical. Precipitation is the result of intricately interrelated atmospheric and land surface processes. It has extreme variability over time scales from seconds to years and spatial scales from less than meters to hundreds of kilometers. The sensitivity of hydrologic behavior to the space-time variability of rainfall is the result of nonlinear interactions between precipitation and land surface characteristics controlling the transformation of rainfall into soil water and runoff. Modeling of precipitation requires understanding of the statistical structure of space-time precipitation and understanding of the physical processes governing the evolution of precipitation at a range of space-time scales. Because of scale differences, rainfall downscaling is required for coupling global (or regional) atmospheric models and hydrologic models. In general, downscaling schemes can be grouped in two broad categories, dynamical and statistical downscaling schemes.

In dynamic schemes, climate and land-use change scenarios at regional and local scales are developed using regional and local atmospheric models, for example, the Regional Atmospheric Modeling System (RAMS) of Colorado State University. These models are driven by boundary conditions derived from observations and from the output of global atmospheric models. In this way, the atmospheric model acts as a physically based dynamic interpolator (i.e., physically based downscaling). RAMS has been coupled to a land surface scheme (LEAF-2), to a hydrologic model (TOPMODEL), and to a regional ecosystem model (CENTURY). Thus, dynamical schemes encode multiple, nonlinear and complex local and regional interactions and feedbacks, explicitly.79'115 However, trying to resolve processes at ever decreasing scales using physically based models rapidly leads to computational inefficiencies and is limited by poor understanding of physical process behavior at small scales. Other atmospheric models such as the NCAR models33 and the Penn State/NCAR mesoscale model, Version 5,19 have been used for dynamic downscaling.

In statistical downscaling, subgrid temporal and spatial scale details of climatic variability, in particular precipitation, are obtained so that the statistical characteristics of the spatial and temporal variability of hydroclimatic fields is preserved as a function of scale. Statistical techniques are commonly based on linear or nonlinear regression, methods from nonlinear dynamics, artificial neural networks, Markov processes, multiplicative random cascade models, etc. One of the limitations of regression approaches is that they are applicable only if a strong relationship between a large-scale parameter and regional and local climate has been identified (often this will not be the case), and they are valid only within the spatial and temporal range of the observations. Although statistical downscaling is computationally efficient, it cannot include the subgrid scale physical feedbacks referred to above explicitly, and it is difficult to couple atmospheric processes with regional ecological and hydrological processes. On the other hand, statistical downscaling based on multiplicative random cascade models can reproduce the scaling features (i.e., scale invariance), the clustering, and intermittency that are characteristic of precipitation fields in space and time with relatively modest computational burden.

Until recently, most of the downscaling methods proposed in the literature only dealt with the spatial variability of the precipitation field while the temporal evolution of the fields is usually described independently of the spatial downscaling. The only temporal correlation structure accounted for is that resulting from the dynamics of the atmospheric model producing the precipitation field at the larger spatial scale or that encoded in the temporal evolution of the observations. Thus, in general, these schemes do not fully and properly account for the temporal correlation structure (i.e., persistence) of the precipitation fields at subgrid scales.

### Dynamical Downscaling

Dynamical downscaling can be considered with respect to four basic types of models: one type is strongly dependent on larger-scale numerical weather prediction lateral boundary conditions, bottom boundary conditions, and on initial conditions. A second type has forgotten the initial conditions but is dependent on the observed lateral and bottom boundary conditions. A third type is where a large-scale model is run that is only forced with surface boundary conditions, and the output used to downscale with a regional model. A fourth type is when a true global climate model (with coupled ocean, atmosphere, continental sea ice, landscape processes, etc.) is used to provide lateral boundary conditions to a regional model. This is the Intergovernmental Panel on Climate Change (IPCC) type of downscaling except only a limited set of Earth system forcings (e.g., the radiative effect of C02, solar insolation) is included in the IPCC approach. To summarize with examples (IC = initial conditions; LBC = lateral boundary conditions, and BBC = bottom boundary conditions; with the recognition that BBC includes bottom interfacial fluxes): type 1 ETA4 uses observed IC, LBC, and BBC; type 2 PIRCS34"35 uses observed LBC and BBC; type 3 ClimRAMS forced by CCM3 integrated with observed SSTS102 uses observed BBC; and type 4 Earth system global model downscaled using a regional model.45 Observational constraints on the solution become less as we move from type 1 to 4. Thus forecast skill will diminish from type 1 to 4.

With respect to current generation models, such as atmospheric-ocean global circulation models (AOGCMs), neither AOGCMs nor the regional ones (type 4 models) include all of the significant human effects on the climate system. The combined effects of land-use change, the biogeochemical effect on the atmosphere, e.g., due to increased C02, and, e.g., the microphysical effect of pollution aerosols have not yet been included in these models. Thus the existing model runs should only be interpreted as sensitivity experiments, not forecasts, projections, or even scenarios.80

In addition, with respect to dynamic downscaling, as currently applied, there is not a feedback upscale to the AOGCM from the regional model, even if all of the significant large-scale (GCM scale) human-caused disturbances were included. The AOGCM also has a spatial resolution that is inadequate to properly define the lateral boundary conditions of the regional model. Anthes and Warner2 show that the lateral boundary conditions are the dominant forcing of regional atmospheric models as associated with propagating features in the polar westerlies. With numerical weather prediction (type 1 and 2 models), the observations used in the analysis to initialize a model retain a component of realism even when degraded to the coarser model resolution of a global model. This realism persists for a period of up to a week or so, when used as lateral boundary conditions for a regional numerical weather prediction model. This is not true with the AOGCMs where data do not exist to influence the predictions. A regional model cannot reinsert model skill, when it is so dependent on lateral boundary conditions, no matter how good the regional model.

### Statistical Downscaling

The output of mesoscale atmospheric models such as RAMS or the observations data such as those from the NEXRAD (next generation radar) network are usually at grid sizes that are larger [e.g., 0(103 to 104) m] than those associated with distributed hydrologie models (e.g., in the order of 10° m). The land surface system responds to excitations from the atmosphere, e.g., precipitation, and feeds back moisture into the atmosphere, e.g., through évapotranspiration and latent heat flux. The excitations and responses are spatially heterogeneous over a broad range of scales, including subgrid scales [e.g., < 0(104)m], This subgrid spatial variability has a significant impact on the magnitude and distribution of upscale and downscale land surface fluxes whose interaction is nonlinear. Accounting for this space-time heterogeneity is important for hydrologie modeling and for describing land surface-atmosphere interactions.23'78'83 In addition, the statistical downscaling, besides requiring that the AOGCMs are accurate predictions of the future, also require that the statistical equations used for downscaling remain invariant under changed regional atmospheric and land surface conditions. There is no way to test this hypothesis. In fact, it is unlikely to be valid since the regional climate is not passive to larger-scale climate conditions but is expected to change over time and feedback to the larger scales. More details of this concern regarding downscaling have been reported.8'

Regression Schemes. The relationships between large-scale and local-scale climatic fields can be established by regression-based schemes. The most direct way of downscaling is by direct interpolation. This method is easy to apply and effective for smoothly varying fields such as sea level pressure or temperature but not appropriate for nonsmooth intermittent fields such as precipitation. Some examples on regression schemes are (1) a method (based on principal component analysis, canonical correlation, and regression analysis) called climatological projection by

behavior) of the scaling exponents has been found in the spatial distribution of rainfall

37,75,109

and in the temporal distribution of rainfall. Also both the scale invariance and intermittency of precipitation may be exploited to develop parsimonious stochastic models of rain.63

Multiplicative random cascades have been used to generate fractal fields that emulate the spectrum of scaling exponents of observed rainfall. Cascade generators are chosen according to the scaling spectra they produce. Notably, models such as the universal multifractals,109 the /j-modcllX or the log-Poisson model101 were proposed. For illustration, we will discuss below random cascade models.38,46'75

Discrete random cascades distribute mass on successive regular subdivisions of a ^-dimensional cube. A schematic of this process is shown in Figure 4. The initial cube, with length scale L0, is subdivided at each level into b equal parts, where b > 2d is the branching number. The z'th subcube after n levels of subdivision is denoted A'n (there are i = I b" subcubes at level n). The length scale of the subcube A'n is denoted Lm and the dimensionless spatial scale is defined as Xn = LJL0 = b~nld. The distribution of mass through different levels on the cubes occurs as follows. First the initial cube (at level n = 0) is assigned a nonran-dom density R0, i.e., an initial mass RqLq. The subcubes A\, i ~ ..., b after the first subdivision (at level n = 1) are assigned the density R(j Wx (;), i.e., mass R0L!\Wx(i), where W are independent and identically distributed (iid) random variables—the cascade generator. This multiplicative process continues through all n levels of the cascade, so that the mass in subcube A'n is

Ld0bn n wÀ o for i=\,2,...,bn where E[W] = 1; thus mass is on the average conserved at all levels in the random cascade.

5 SCALING ISSUES AND OOWNSCALING 627 branching number-*! parameter : P. Y. O

3rd cascade

3rd cascade

L-W21

L-LV23

L-V3J

Ria-S^W.WjWj

(a) discrete random cascade modd

Figure 4 Schematic diagram for a two-dimensional discrete random cascade models: (a) Discrete random cascade model (from Kang and Ramirez46).

STOCHASTIC SIMULATION OF PRECIPITATION AND STflEAMFLOW PROCESSES branrlting Dimb9^4 paiamAfr fr m.

WV31

(b) reratanmiftnc hjaarciBcaJ randtm cascade modd

Figure 4 Schematic diagram for a two-dimensional discrete random cascade models: (b) nonparametric random cascade model (from Kang and Ramirez4*).

The cascade limit mass is obtained as n —> oc. and it is considered degenerate if the total mass is zero with probability 1. Nondegeneracy depends on the distribution of W, and it requires that the condition E[W] — 1 be satisfied. The limit mass in a subcube /(^(Aj,) satisfies a recursion relation: ¿^(Aj,) = fin(A'n)Z^(i), for i= 1 b" where are iid random variables, distributed as Zx = /j^(A0)/

/in(An) = ¿uoo(A0)/7?0Z0 for all i, n. The cascade limit mass ¿/^(AJ,) is given by the product of a large-scale low-frequency component ¿u„(A'n), and a subgrid subgrid (i.e., subcube) scale high-frequency component Z^(z'). The latter term represents subgrid-scale variability at each level of cascade development.

Random cascades exhibit moment scaling behavior from which properties of the cascade generator W can be estimated. Sample spatial moments are defined as Mn(q) = XlLi tioo(A'n), where q = moment order (for q — 0, only nonzero limit masses are included in the sum). For large n, the sample moments should converge to the ensemble moments, but since they diverge to infinity or converge to zero as n —► oo, the rate of convergence/divergence of the moments with scale is considered instead. In a random cascade, the ensemble moments are shown to be a log-log linear function of the scale Xn. The slope of this scaling relationship is known as the Mandelbrot-Kahane Peyriere (MKP) function: /j,(q) = \ — q + log/; E\Wq]. The MKP function contains important information about the distribution of the cascade generator W and thus characterizes the scaling properties of rainfall. Similarly, the slope of the sample moment scaling relationship can be defined as z(q) = lim, _^n[log Mn(q)/ — log kn]. For large n (as Xn 0) and for a specific range of q, slopes of the moment scaling relationships for sample and ensemble moments converge, i.e., r(q) = dyh(q). In data analysis, the scaling of the sample moments is used to estimate the z(q) function and the distribution of the cascade generator, from which parameters of the cascade model can be inferred.

For intermittent temporal and spatial rainfall data, it is desirable that P(W = 0) be positive. For this purpose an intermittency model for the cascade generator W is written as W = BY, where B is an intermittency generator of the so-called /i-model and Y is a strictly positive random variable. The /? model divides the domain into rainy and nonrainy fractions based on the following probabilities: P(B = 0) = 1 - b-1'' and P(B = b1'') = where (i is a parameter and E[B] = 1. The ji model does not allow for variability in the positive part of the large-scale component of the limit mass ¿u„( A'„) (at every level n it assumes the nonrandom value R0LndbP"). Variability in the positiv

## Post a comment