Introduction to Ocean Data Assimilation

Edward D. Zaron

Conventional ocean modeling consists of .solving the model equations as accurately as possible, and then comparing the results with observations. While encouraging levels of quantitative agreement have been obtained, as a rule there is significant quantitative disagreement owing to many sources of error: model formulation, model inputs, computation and the data themselves. Computational errors aside, the errors made both in formulating the model and in specifying its inputs usually exceed the errors in the data. Thus it is unsatisfactory to have a model solution which is uninfluenced by the data. Bennett (Inverse Methods in Physical Oceanography, 1st edn. Cambridge University Press, New York, p. 112, 1992)

Abstract Data assimilation is the process of hindcasting, now-casting, and forecasting using information from both observations and ocean dynamics. Modern ocean forecasting systems rely on data assimilation to estimate initial and boundary data, to interpolate and smooth sparse or noisy observations, and to evaluate observing systems and dynamical models. Every data assimilation system implements an opti-mality criterion which defines how to best combine dynamics and observations, given an hypothesized error model for both. The realization of practical ocean data assimilation systems is challenging due to both the technical issues of implementation, and the scientific issues of determining the appropriate set of hypothesized priors. This chapter reviews methodologies and highlights themes common to all approaches.

Department of Civil and Environmental Engineering, Portland State University,

P.O. Box 751, Portland, OR 97207, USA

e-mail: [email protected]


A. Schiller, G. B. Brassington (eds.), Operational Oceanography in the 21st Century, 321

DOI 10.1007/978-94-007-0332-2_13, © Springer Science+Business Media B.V. 2011

13.1 Introduction

There are many technologies for observing the ocean. Examples include instruments for taking measurements at fixed points, such as acoustic Doppler velocimeters; horizontal and vertical profilers, such as towed conductivity-temperature-pressure sensors (CTDs); and spatially extensive, nearly instantaneous or synoptic measurements, such as satellite imagery or radiometry. Every measurement system is defined by the physical variables it measures, spatial and temporal resolution and averaging characteristics, which determine how high frequency information is either smoothed or aliased to lower frequencies, and the noise and bias properties of the instrumentation. Given the large size of the ocean, and the great expense of measurement and observation systems, no practicable observation systems completely determine the state of the oceans. Hence, models are necessary to complement the basic observations.

However, the ocean itself is a turbulent fluid, and small changes in initial conditions can have a significant impact on the subsequent evolution of the fluid. Even if it were possible to completely solve the partial differential equations of fluid motion, the prediction of the oceanic state would be limited by the accuracy of initial conditions and boundary data (e.g., the air-sea flux of momentum). In practice, numerical ocean models truncate the degrees of freedom of the continuum equations, and the parameterization of the neglected motion on the resolved scales is a significant source of error in our ability to simulate the fluid flow accurately.

It is these considerations, the relative paucity of observational data and the limitations of models, which provide the impetus for data assimilation. Ocean models are capable of accurately simulating dynamics at resolved scales, with exact or nearly exact conservation of properties such as mass, energy, or potential vorticity, depending on the model. The goal of data assimilative modeling is to produce estimates for the oceanic fields of temperature, salinity, pressure, and three-dimensional velocity, which are maximally consistent with observations and numerical model dynamics, allowing for errors in both.

Progress in ocean data assimilation has been enabled by advances in computing machinery over the last 30 years, but the theory and techniques of data assimilation have a long history with mathematical roots in probability and estimation theory, inverse theory, and the classical calculus of variations. The operational roots of data assimilation are closely tied to the weather prediction community, which has long dealt with the problem of how to smooth and interpolate sparse measurements in order to optimize subsequent weather predictions (Daley 1991).

This introduction to the subject of ocean data assimilation is selective. The goal is to touch on major points of theory and implementation, introducing common themes which are developed in the primary literature. After reading this chapter, the reader should be well-prepared to survey the the many textbooks and review articles on ocean data assimilation (Bennett 1992, 2002; Wunsch 1996; Talagrand 1997; Kalnay 2003; Evensen 2006).

The article begins by reviewing the purposes of data assimilation. Then Bayes' Theorem is applied to derive optimal interpolation and the Kalman filter. The first part of the article closes by outlining the basic components common to all data assimilation systems. The second part of the article provides a background for the analysis of data assimilation systems, describing technical issues of implementation, and scientific issues of covariance estimation. Notation and nomenclature vary widely in the primary literature, and an effort has been made to use a consistent but minimal notation which is in accord with recent usage. Two appendices are attached which provide, respectively, annotated definitions of significant terms and pointers to web-resources for data assimilation.

13.2 The Purpose of Data Assimilation

Much like the ancient Indian parable of the seven blind men and the elephant (Strong 2007), there are several different perspectives on the purpose of data assimilation. The parable describes how the men perceive the elephant, each drawing a very different conclusion about its shape or function. One man felt the tail, and concluded an elephant is like a rope; another felt the tusk, and noted its spear-like properties; etc. Similarly, the field of ocean data assimilation has developed in a number of directions, each with a different goal or point of emphasis. The literature is diverse, and the disparate nomenclature can sometimes obscure common themes and methodological approaches.

The main themes and goals of data assimilation may be briefly summarized as follows:

Interpolation, Extrapolation, and Smoothing The purpose of data assimilation is to estimate the state of the ocean using all information available, including dynamics (e.g., the equations of motion) and observations. The end goal of data assimilation is to produce an analysis, an estimate of oceanic fields which are smoothly and consistently gridded from sparse or irregularly distributed data, and in which the dynamical relationships amongst the fields are consistent with prior physical considerations, such as geostrophic balance. Where measurements are sparse, the analysis fields ought to interpolate the measurements, or nearly so, with allowance for the measurement error. Where measurements are absent, they ought to be extrapolated from nearby measurements, consistent with the assumed dynamics. Where measurements are dense, redundant, or particularly inaccurate, the analysis fields ought to be plausibly smooth, containing no more structure than is warranted by the observations and the dynamics.

This view of data assimilation forms the basis for most of the work in ocean data assimilation, some representative works being Oke et al. (2002), Paduan and Shulman (2004), and Moore et al. (2004). Several groups are currently involved in real-time ocean analysis, incorporating diverse forms of data (i.e., ARGO float profiles, XBT data, sea-surface temperature, etc.) into global and regional ocean models. Global real-time analyses and forecasts are produced by the European Center for Medium Range Forecasts (Balmaseda et al. 2007), the Australian Bureau of Meterolology (2009), the U.S. National Center for Environmental Prediction (2009), and others. Retrospective hind-casts, also called reanalyses, are produced by several groups, including the Jet Propulsion Laboratory (2009) and the University of Maryland (2009).

Parameter Calibration The purpose of data assimilation is to develop the most accurate model of the ocean, by systematically adjusting unknown or uncertain parameters so that model predictions are maximally congruent with calibration data. The emphasis is on adjusting what may be highly uncertain or difficult-to-measure physical parameters, e.g., scalar parameters involved in turbulence sub-models, or fields, e.g., the sea-bed topography. From the perspective of parameter calibration, the end goal of data assimilation is to produce the best possible model for future prognostic or data assimilative studies, which maximizes the information gained, neither over- or under-fitting the calibration data. There is a significant oceanographic literature in this area, but parameter estimation generally involves the solution of strongly nonlinear inverse problems, which can be more complex than state estimation (Lard-ner et al. 1993; Heemink et al. 2002; Losch and Wunsch 2003; Mourre et al. 2004).

Hypothesis Testing The purpose of data assimilation is to systematically test or validate an ocean prediction system, which includes as subcomponents a model of hypothesized ocean dynamics, its error model, and an error model for the validation data. The thorough study of analysis increments, model inhomogeneities, data misfits, and their relations to the hypothesized dynamics and error models is emphasized. The end goal from this perspective is a definitive test of the ocean prediction system, and an analysis of the primary flaws in the dynamical model or observing system. Dee and daSilva (1999), Muccino et al. (2004) and Bennett et al. (2006) are representative examples.

Once a prediction system has been validated, by formal hypothesis testing or other means, the data assimilation system can be used to design and predict the performance of future observing systems. For this purpose an observing system simulation experiment (OSSE) may be conducted, using so-called identical twin experiments, to assess the impact of present and future observational assets or data sources (Atlas 1997). A recent application to coupled ocean/atmospheric modeling for the detection of climate change is found in Zhang et al. (2007).

Summary: Operational Ocean Data Assimilation in Practice Probably the most widely-used approach to ocean data assimilation involves a sequential assimilation x x x x x x x x x x obs obs obs x x x x x x x x x x obs obs obs

Fig. 13.1 Sequential analysis of observations binned in time. The red lines indicate the ocean state trajectory predicted from initial conditions at the analysis times (reddots). Observations obtained at within the analysis window (green) are binned and assimilated only at the analysis times

Fig. 13.1 Sequential analysis of observations binned in time. The red lines indicate the ocean state trajectory predicted from initial conditions at the analysis times (reddots). Observations obtained at within the analysis window (green) are binned and assimilated only at the analysis times reanalysis, retrospective analysis, or hindcast reanalysis, retrospective analysis, or hindcast

Fig. 13.2 Reanalysis or smoothing of observations. Reanalysis or smoothing finds the ocean state trajectory (red) most consistent with the observations (green) and the dynamical model within a time window

Fig. 13.2 Reanalysis or smoothing of observations. Reanalysis or smoothing finds the ocean state trajectory (red) most consistent with the observations (green) and the dynamical model within a time window of observations, as depicted in Fig. 13.1. The ocean model is integrated forward in time from initial conditions, providing a first guess or background field at the subsequent analysis time. Data are assimilated to produce an analysis by optimally combining information from the model and observations. The analysis is used as the initial condition for the next prediction cycle, and the process repeats. The process of estimating the ocean state through series of sequential analysis steps is a type of signal filtering, for which the Kalman Filter is the prototype (Gelb 1974), and most sequential ocean data assimilation methods can be analyzed from this perspective.

A sequential analysis procedure assimilates the observations, but the ocean state estimate obtained is discontinuous and not consistent with the model dynamics or boundary conditions at the analysis times. To obtain state estimates which are continuous, it is necessary to use the Kalman Smoother or related method (Fig. 13.2). This mode of data assimilation is often used for hindcasting or reanalysis; although, the term reanalysis is also used to denote the sequential analysis of historical data, particularly in operational weather prediction, where such reanalyses are performed using state-of-the-art techniques or more complete data sets than were originally available. For ocean forecasting systems, the ocean state at the end of the smoother time window is the now-cast which is used as initial conditions for the ocean forecast.

Because smoothing algorithms compute an analysis over an entire time-window, while filter algorithms compute an analysis at a single time, smoothers are generally more computationally expensive than filters. The development of smoother algorithms which are computationally practicable is a goal of recent efforts in ocean prediction (Powell et al. 2008). In practice, a type of fixed-lag smoother may be used (Fig. 13.3), which assimilate observations over some time window prior to the current now-cast. For example, in 4D-Var assimilation one finds the initial conditions and boundary conditions which are most consistent with the observations during an assimilation interval, and the model integration is carried forward in time to provide predictions over a subsequent forecast interval.

13.3 Mathematical Formulation

Data assimilation involves the optimal utilization of information from different sources. Bayes' Theorem is a concise foundation for expressing data assimilation methods since it is concerned with the combination of information as expressed


Fig. 13.3 4D-Var. In the 4D-Var algorithm the initial conditions are found (red dots) to optimize the ocean state trajectory (red line) with respect to observations (green) within the assimilation window in probabilities. Optimization criteria and statistical estimators may be derived by considering the posterior probability of the state to be estimated, conditioned on the values of the observations. A non-rigorous introduction is presented here;details concerning the applicability of probability densities to function spaces are glossed over. Wahba (1990) contains an introduction to the central issues and is a good entry point to the specialized literature.

13.3.1 Bayes' Theorem

Let PX (x) denote the probability density function (pdf) of a random variable X, so that the probability of X lying in the interval (x, x+dx) is given by PX(x)dx. To be concrete, suppose that X represents an oceanic state, and Y represents a measurement of the state. Measurements contain error, so assume that Y=X+ s, where s is the measurement error, a random variable.

In principle there is a probability density for the oceanic state PX(x) which is a function of the forcings on the ocean, taken to be unknown random variables. Likewise, there is a probability density which describes the measurement errors, Ps(e), which is usually expressed in terms of PY(y | x), the pdf of observations conditioned on the oceanic state, x. The joint probability of the state and the measurements PXY(x, y) (the probability of x and y) and the conditional probability are related by the definition,

Was this article helpful?

0 0
Guide to Alternative Fuels

Guide to Alternative Fuels

Your Alternative Fuel Solution for Saving Money, Reducing Oil Dependency, and Helping the Planet. Ethanol is an alternative to gasoline. The use of ethanol has been demonstrated to reduce greenhouse emissions slightly as compared to gasoline. Through this ebook, you are going to learn what you will need to know why choosing an alternative fuel may benefit you and your future.

Get My Free Ebook

Post a comment