of the system have to be defined. The main challenges in the application of the filter to forecasting of precipitation and streamflow are to define the appropriate matrices and other components of the filter in terms of the hydrologie variables and to estimate them from available data and knowledge of the physical process. In subsequent sections the application of various statistical techniques including the Kalman filter for forecasting hydroclimatic processes, particularly precipitation and stream-flow, is presented.

Precipitation forecasting is of great significance for water resources management and flood protection, although it is not an easy task. Rainfall forecasts based on the analysis of the temporal and spatial evolution of the meteorological phenomena would be always desirable. Considerable progress has been made in this respect by using numerical weather prediction approaches and general circulation models. However, information from such sources is not always available in operational form. In this situation rainfall forecasts can be made based on the persistence characteristics of current and past rainfall measurements, even though the accuracy of such forecasts may suffer because of the lack of the physical aspects involved in the precipitation phenomena. A number of examples of precipitation forecasting by statistical, stochastic, or probabilistic techniques can be found in the literature. They include regression techniques, Markov chains, ARMA-type methods, probability-function-based approaches, and artificial neural networks (ANNs). All of these approaches have been used for short-term and for mid- and long-term forecasting, as illustrated below.

Quantitative precipitation forecasting, often denoted as QPF, is one of the major tasks in flood forecasting. It has been demonstrated that QPF allows extending the lead time of flood forecasts and improving the accuracy of flood estimates for a given forecast lead time (Brath et al., 1988). Although research in the field of numerical weather prediction has achieved significant progress in recent years (see, e.g., Bougeault et al., 2000), forecasting techniques based on stochastic and statistical modeling are useful especially for operational purposes and in the context of mesoscale basins, which are characterized by rapid response time. However, because of the complexity of the rainfall phenomena, which exhibits significant spatial and temporal variability, nonstationarity, and nonlinearity, especially on small scales, rainfall forecasting by stochastic approaches involves a challenging feat and experience.

Early attempts to forecast rainfall were formulated as statistical black-box models used for storm tracking. For instance, Phanartzis (1979) developed a simple model for forecasting the direction of storm movement based on the cross-correlation of rainfall measured at a network of rain gages. A similar approach was developed by

Nguyen et al. (1978) to be used with radar storm tracking signals. A more sophisticated storm tracking statistical procedure based on Kalman filter was proposed by Johnson and Bras (1980). Also, French and Krajewski (1994) and French et al. (1994) used the Kalman filter for state updating and incorporation of uncertainty in a two-dimensional physically based model and surface meteorological observations. Furthermore, Sugimoto et al. (2001) also used the extended Kalman filter as a state estimator to update the model parameter of the conceptual model with new radar data and with forecasts from a numerical weather prediction model.

Other authors, such as Lardet and Obled (1994), generated scenarios of rainfall duration and volume by probability functions conditioned on past rainfall. Statistical methods based on classification trees were also used for QPF (Carter and Eisner, 1997; Carter et al., 2000). In other applications physically based model structures are combined with stochastic components to account for the uncertainties associated with model hypotheses and structure (Jinno et al., 1993). Kawamura et al. (1996, 1997) added a Gaussian white noise in time and space to an advection-diffusion model of space-time rainfall, to consider a certain degree of error and uncertainty inherent in rainfall modeling.

Other approaches try to overcome the intrinsic limitation of persistence-based methods for predicting rainfall, due to the short decorrelation time of the precipitation process, which has been shown to be of the order of approximately 20 min (Zawadzki, 1987). Four stochastically based approaches for forecasting short-term precipitation are presented below.

Point Process Models. The models based on point processes perform satisfactorily with respect to reproducing the cluster dependence properties of observed rainfall (Entekhabi et al., 1989) and related extreme properties (Burlando and Rosso, 1993). However, the formulation required for real-time forecasting is very complex. Ramirez and Bras (1985) developed an algorithm for forecasting storm arrivals assuming the Neyman-Scott white-noise model as the underlying rainfall-generating mechanism. They derived the general expressions for the distribution functions of the time to the next storm event, conditioned on part of the immediate rainfall history, and applied the algorithm for irrigation scheduling. French et al. (1992a) developed a real-time forecasting scheme based on the space-time model of Rodriguez-Iturbe and Eagleson (1987). The forecasting model consists of a single distributed state-space equation, which is used to derive the conditional mean and the conditional covariance of rainfall intensity. Updating of the rainfall field in real time is carried out by representing the model structure as a distributed parameter Kalman filter. While some work has been done in using point and cluster processes for real-time forecasting of precipitation, their development has been limited to research studies.

Regression-Based Methods. A good example of how rainfall forecasting based on statistical methods is useful for operational purposes is the U.S. National Weather Service's centralized statistical quantitative precipitation forecasts (Antolik, 2000). The statistical forecast is based on multiple linear regression (Glahn and

Lowry, 1972; Lowry and Glahn, 1976), where the rainfall amount over a given time interval is predicted as a function of meteorological variables, both observed and computed by numerical weather models. Despite the relative simplicity of the model, it often outperforms physically based methods and more complex techniques, depending on the proper identification of the predictors. The use of regression methods though is more common in long-term forecasting.

Markov Chains Approach. The theory of Markov chains has been suggested for short- and long-term forecasting of rainfall. For example, Bertoni et al. (1992) used a first-order Markov chain for real-time forecasting of rainfall for a few hours lead time, which in turn was used for flood forecasting. Historical rainfall data were classified in states that divide the range of rainfall variation into sequences of nonoverlapping intervals. The transition probabilities were estimated as pij = (i, j — 1 ,...,r), where r is the number of states, and rig is the number of transitions from state i to j, which is computed from historical observations on a seasonal basis. The py values are elements of the transition probability matrix, which is then used to estimate (forecast) the m-step (ahead) transition probability on the basis of the incoming observations (i.e., the present state) and the given conditional nonexceedence probability. The selection of an appropriate nonexceeding probability is key in achieving acceptable rainfall forecasts. Yu and Yang (1997) adopted a similar approach and further analyzed the role played by the choice of the nonexceeding probability with respect to forecast accuracy. In addition to seasonal dependence, the nonexceeding probability strongly depends on storm profile, being considerably different in the raising limb than in the recession limb of the hyetographs.

Dahale and Puranik (2000) applied a six-state simple Markov chain to forecast 5-day spatial rainfall persistence of summer monsoons over the Indian region. Frae-drich and Müller (1983) used a five-state simple Markov chain, and Miller and Leslie (1984) adopted a four-state second-order model to predict rainfall probabilities from past weather states. One must note that high forecast skills are generally obtained for short lead times, and they significantly decrease with increasing lead times. Johnson and Bras (1980) combined forecasts of the mean rainfall rate throughout the event at each gage with the modeling of a random residual component based on a Markovian model. The choice of the optimal order of a Markov chain also plays a role in forecast accuracy. Akaike information criterion and Bayes information criterion can be used for this purpose (e.g., Tong, 1975; Katz, 1981; Gregory et al., 1992).

ARMA Models. Trotta et al. (1977), and Labadie et al. (1981) showed that ARMA and transfer function models can be used for modeling rainfall persistence. They used an autoregressive transfer function model for short-term rainfall forecasting for the purpose of improving the control of a sewer system. The model uses parameters estimated from historical data at the beginning of the storm event, when information of the ongoing event is still poor. As the storm progresses, the parameters are progressively tuned to reflect the increasing real-time information. This is done in the estimation step by including weighting factors in a least-squares algorithm to account differently for the historical information and current rainfall event information. Obeysekera et al. (1987) showed that certain point process models widely applied for modeling short-term rainfall, such as the Poisson rectangular pulse (PRP) and the Neyman Scott rectangular pulse (NSRP), possess correlation structures like those of ARMA(1,1) and ARMA(2,2) models, respectively. Thus, in principle, ARMA models could be used for simulation and forecasting of short-term rainfall processes. Because ARMA models are stationary, and the underlying variable is normally distributed, their application to real-time forecasting of short-term precipitation, such as hourly and daily rainfall, requires certain procedures to be followed to take into account such requirements. Burlando et al. (1993) used the ARMA(2,2) model given as j= i J=i where Zt=Xt — ju, Xt represents hourly rainfall, ¡x is the mean of Xt, 4>j and Oj are the autoregressive and moving average coefficients, respectively, and e, is a normally distributed noise with mean zero and variance a\.

Nonstationarity of the rainfall throughout the year was accounted for either by seasonal estimation of parameters based on the analysis of the continuous data set or by event-based parameter estimation carried out only on extracted nonzero rainfall events. In the latter approach a different parameter set was determined for each storm event considered. To account for the nonlinearity that characterizes storm precipitation events, some modifications were necessary for the estimation of the ARMA model (5) as shown schematically in Figure 1. Specifically, data are first transformed to account for non-normality by means of the Box-Cox transformation (Box and Cox, 1964), and the estimation of the model parameters is performed by an iterative adaptive least-squares technique. Thus, the data used for estimation are only those available as the storm event evolves through time, implicitly assuming a local stationarity. While the results based on the continuous data set were not satisfactory, the event-based application provided satisfactory results. Figure 2 shows an example of the forecast accuracy. A noticeable problem, however, is the one-hour phase shift that characterizes most of the forecasted events. Toth et al. (2000) obtained similar results by slightly modifying the procedure introduced by Burlando et al. (1993), in that the event-based parameter estimation was carried out on the basis of a moving window of fixed length rather than on the complete event data sequence. Transformation of data was also relaxed because forecast applications based on ARMA models do not require the data to be Gaussian, i.e., ARMA models provide the best linear prediction even for non-Gaussian data (Brockwell and Davis, 1991).

The temporal phase shift exhibited by forecasts obtained by the univariate ARMA model can be partially explained by the error induced by storm movement. One can ameliorate this effect by selecting additional data measured at other neighboring stations (e.g., by using the cross-correlation between the rainfall at the station of

interest, i.e., the station where the forecast is issued, and those at the other stations) and reduce the error associated with the phase shift. Using a multivariate integrated ARMA (MARIMA) forecasting scheme (Montanari et al., 1994; Burlando et al., 1996) can do this. Montanari et al. (1994) suggested that a multivariate scheme could remarkably improve the forecasts when the rain gages to be used in forecasting

time from the beginning of the event [h] time from the beginning of the event [h)

time from the beginning of the event [h] time from the beginning of the event [h)

time from the beginning of the event [h] time from the beginning of the event (4i]

Figure 2 Example of 1- and 2-h rainfall forecasts for the event of October 14, 1960, Denver, Colorado, obtained by means of an ARMA(2,2) process (from Burlando et al., 1993).

time from the beginning of the event [h] time from the beginning of the event (4i]

Figure 2 Example of 1- and 2-h rainfall forecasts for the event of October 14, 1960, Denver, Colorado, obtained by means of an ARMA(2,2) process (from Burlando et al., 1993).

are selected adequately. Burlando et al. (1996) showed that the estimation of a Lagrangian space-time correlation of the moving storm could be made using storm maps recorded by weather radar, which provide the direction and the speed of the storm movement. Storm tracking can thus be applied to actual events to select those stations that are characterized by the highest Lagrangian cross-correlation of observed precipitation, and therefore are the best suitable for application with the multivariate model. The parameters of the multivariate model are thus estimated using only observed rainfall at the selected stations throughout the current event.

Specifically, the MARIMA model estimates the future occurrences of a time series as a linear combination of (a) past occurrences of the underlying time series and of time series which are cross-correlated to it—i.e., the autoregressive component—and of (b) the present and past occurrences of a random white-noise component—i.e., the moving average component. The MARIMA model can be expressed as

where p and q are the autoregressive and the moving average order respectively, Z, = (I - B)rf X„ X, is the rainfall intensity, I is the identity matrix, B is the backward operator, d is the differencing order of the model, and e, is a normally distributed noise term. Both Z( and X, are «-dimension column vectors (n = number of series), and <£> and 0 are the n x n autoregressive and moving average parameters matrices of the model. The number of parameters in (6) becomes large as the orders p and q increase. This is a major limitation in analytical tractability and parameter estimation especially in those cases where a limited number of observations are available. Accordingly, the values of p and q, as well as the number of series n, should be selected as a compromise between the conflicting needs of the process descriptiveness and of mathematical tractability.

Burlando et al. (1996) explored the suitability of the MARIMA(1,1,0) model for a catchment in northern Italy. Parameter estimation was carried out on individual events, as in Burlando et al. (1993), and using the method of moments as

where M0 and Mj denote the lag-0 and lag-1 covariances, respectively. The identification of the pair of stations was carried out either on the basis of historical cross-correlations or from the analysis carried out in real time from radar maps. The latter provided the basis for the analysis of the kinematic characteristics of the storm, so allowing the identification of a (first) lead station, located downwind the (second) forecasting station. The lead station is taken as a reference station for the second station is selected among those located along the direction of the storm movement that is identified from the radar maps. The MARIMA( 1,1,0) was thus estimated using rain gage data observed at the selected stations, and rainfall forecasts were issued at each station as a function of the current and past occurrences observed at the station itself and at the lead station. Satisfactory results were obtained as reported in Burlando et al. (1996).

Artificial Neural Networks. An alternative route to the foregoing stochastic forecasting techniques is the use of artificial neural networks. These are essentially data processing systems that can reproduce by learning the relationships between a pair of one- or multidimensional data sets. An artificial neural network (ANN) is made of many simple nonlinear units that mimic the human neurons. These collect the input from a single or multiple sources producing an output according to a predefined nonlinear function. In a sense an ANN is a sort of a transfer function model that appears to be suitable to tackle the problem of rainfall forecasting.

Use of ANNs for the purpose of weather-related quantities started in the early 1990s. French et al. (1992b) developed a neural network to forecast rainfall intensity fields in time and space, which were generated by a modified version of the stochastic rainfall simulation model proposed by Rodriguez-Iturbe and Eagleson (1987). The network with input, hidden, and output layers was trained using the back-

652 STOCHASTIC FORECASTING OF PRECIPITATION AND STREAMFLOW PROCESSES

propagation technique on a regular grid domain to test the ability of the ANN to investigate the role of the number of hidden nodes on its performance. The model skill was tested based on a varying number of training sets and the rainfall fields generated by the stochastic model. Real-time learning and off-line learning were additionally tested. Kuligowksi and Barros (1998) applied a combination of precipitation data from a number of rain gages and wind directions to forecast rainfall amounts for a target location and a lead time of 6h. Specifically, rainfall observations at rain gages in a region of radius 300 km centered on the target location, upper level winds at three radiosonde locations, and wind direction data from a number of levels were combined to build the training set of the ANN.

More recently, Luk et al. (2000) adopted ANNs to forecast short-term rainfall for an urban catchment, aiming at the investigation of the effect of temporal and spatial information on short-term rainfall forecasting. The forecast accuracy of ANNs was evaluated for different configurations of lag orders and number of spatial inputs based on historical rainfall patterns. They concluded that the most accurate predictions depend on the identification of an optimum number of spatial inputs, and that the network with lower lag consistently produced better performance. An interesting application of ANNs has been recently shown by Toth et al. (2000), who provided for a real case study a comparison of ANNs performance with respect to real-time prediction based on ARMA models and a nonparametric nearest-neighbor technique. Multilayer feed-forward network architectures were tested against the one-layer scheme in order to determine the optimal network configuration, both in the case of split-sample application and adaptive calibration. As one may expect, better performances were obtained for the split-sample application, which makes use of larger training sets, whereas the adaptive calibration gives worse results for short lead times. Compared to other forecasting techniques ANNs was slightly superior in the overall performance due to their ability to account for the nonlinearities that characterize temporal rainfall. Grecu and Krajewski (2000) reported another interesting application of back-propagation neural network (BPNN) for rainfall forecasting. In this case, rainfall amounts were not directly modeled by means of the BPNN, but this was used to model one component of the statistical radar-based quantitative precipitation forecast procedure.

Was this article helpful?

## Post a comment