Climate Variable Selection

The above example used average growing season temperature, which is indeed a very common measure of growing season weather. However, there are many other defensible variables to use in place of or in addition to this value. We distinguish here between two main choices: variable type and temporal scale. Variable type decisions involve, for instance, whether to include a term related to temperature, one for precipitation, and/or one for solar radiation or some other meteorological variable. Temporal scale decisions include extent (i.e., what length of growing season to consider) and resolution (i.e., how many intervals within the growing season to include). For example, while we defined the growing season as April-September, one could argue that March-August or June-September is a better definition. For resolution, many have argued that intra-seasonal variations in weather can be as important as averages (e.g., Thompson 1986; Hu and Buyanovsky 2003; Porter and Semenov 2005). Heat or rainfall during critical flowering stages for example, may be as or more important than average conditions. Again, while this is certainly true to some extent, the key question is how much the final analysis is affected by this decision.

One aspect of intra-seasonal variation is the length of time the crop spends above critical heat thresholds. For maize, it is commonly thought that temperatures above 30°C are particularly bad for crop development and growth (see Chapter 4). With hourly data, one can compute the number of hours spent above some threshold for the entire growing season in addition or in lieu of using growing season averages. Such decisions depend a great deal on the availability of fine scale meteorological measurements. In many parts of the world, reliable data are only available for monthly averages (briefly discuss here the approach to deriving degree days).

Figure 5.3 illustrate three climate variables for US maize: average growing season temperature and precipitation, and degree days above 30°C (GDD30), all plotted against each other and yield anomalies. The numbers below the diagonal in Fig. 5.3 indicate the correlation coefficient between the pair of variables. In this example, average temperature shows a significant correlation with yields, but less so than GDD30. Precipitation exhibits a slight positive correlation with yields and negative correlations with both temperature measures.

An important point illustrated in Fig. 5.3 is that different climate variables are often highly correlated with each other, such as average temperature and GDD30 in this example. Thus it is impossible to say exactly how much of the observed correlation between yields and average temperatures is due to a real effect of average conditions, and how much is due to a real effect of very hot days or reduced precipitation that happens to be correlated with average temperatures.

17.5 18.5 19.5 20.5 10 20 30 40 50 60 - —i—i—3—i—i—i—n -;- |—i—1-1-1-1-•—

19.0" r = -0.31 Avg Temperature .."."J®!'' "

60 50 40 30 20 10

70 60 50 40

Fig. 5.3 Scatterplot of data for the US maize example

This problem of colinearity is common in statistical analysis, and often makes it impossible to attribute yield changes to a single climate variable. The obvious risk is that one may attribute yield losses to one variable when in fact another variable is the true culprit. The best approach to minimizing colinearity is to obtain samples where the climate variables are not highly correlated. For example, although growing season daytime and nighttime average temperatures are often very highly correlated, there are some locations in the world where this is not the case. Lobell and Ortiz-Monasterio (2007) focused on three such regions to evaluate the response of wheat yields to night and day temperatures.

A useful method for gauging the effect of colinearity is to evaluate partial correlation coefficients, i.e., the correlation between yield and a climate variable after the correlations with all other variables have been removed. Similarly, one can compute regressions between a variable and the residuals from a regression of yield on all other variables. Comparison of this value with the coefficient from an ordinary multiple regression will provide some measure of the role that colinearity plays.

Avg Temperature

Avg Precipitation r = 0.72

Degree Days > 30

40 50 60 70

Overall, colinearity is perhaps the biggest obstacle to time series modeling. In some cases it may be possible to distinguish between apparent and true effects on yield with knowledge of biological processes. More likely, this distinction is subjective and subject to disagreement. In an analysis of experimental rice yield responses to warming, for instance, Peng et al. (2004) reported a roughly 10% loss of yield for each degree of nighttime warming based on time series analysis. A subsequent analysis by Sheehy et al. (2006) used the rice process-based model ORYZA2000 to demonstrate that roughly half of the perceived effect of temperature could actually be due to changes in solar radiation, which are negatively correlated with nighttime temperature in this location. Similarly, Lobell and Ortiz-Monasterio (2007) compared statistical models with CERES-Wheat simulations to show that correlations of solar radiation and nighttime temperature can confound interpretation of statistical models. In the end, only controlled experiments can be used to uniquely identify the effect of a single variable when all others are held constant.

A related point illustrated by Fig. 5.3 is that omission of important variables can bias results. Maize yield correlates much more strongly with GDD30 than average growing season temperature in this region. Yet measures of exposure to extreme heat such as GDD30 have not been widely used, with most studies focused a priori on weekly or monthly averages. The choice of which variables to consider is often dictated by data availability - there are few regions in the world where reliable sub-daily data on temperatures extend back prior to 1980. There are similarly few good datasets on solar radiation, which as discussed above can be an important omitted variable because it is often correlated with temperature and rainfall.

Only by comparing results with and without the inclusion of variables such as GDD30 or solar radiation can we estimate the bias that their omission introduces in specific locations. Moreover, only by repeating these studies for a large number of locations can we make more general statements about the importance of these factors for future impacts, although strong claims for the importance of extreme events are frequently heard (Easterling et al. 2007).2 It should also be clear that the importance of different variables may depend on the time scale for which projections are being made. For example, GDD30 may initially increase slowly as temperatures rise but more rapidly as average temperatures approach 30°C.

To summarize, time series methods are hampered by frequently high correlations between climate variables. In cases where two correlated variables are both included in the model, attribution of yield changes to any single variable is difficult if not impossible. In cases where an important variable is omitted, there is risk of attributing too much importance to a correlated variable included in the model. Even when the omitted variable is not correlated with included variables, there is a risk that its omission will miss an important effect of climate on yields.

2 The recent IPCC Fourth Assessment Report states that "Projected changes in the frequency and severity of extreme climate events will have more serious consequences for food and forestry production, and food insecurity, than will changes in projected means of temperature and precipitation (high confidence)."

One may wonder at this point why we do not typically just include all possible climate variables in a regression analysis. As already stated, one common reason for omitting variables is lack of reliable data. More fundamental is the fact that increasing model complexity by adding more and more variables will eventually result in a model that is over-fit to the data, including the noise present in the data, and has worse predictive skill than a model with fewer variables. The balance between including enough but not too many variables is known in statistics as the bias-variance tradeoff, and places a premium on choosing variables wisely. As mentioned, knowledge of the biological processes that control crop growth and reproduction can be of tremendous value in the search for the "right" variables.

Renewable Energy 101

Renewable Energy 101

Renewable energy is energy that is generated from sunlight, rain, tides, geothermal heat and wind. These sources are naturally and constantly replenished, which is why they are deemed as renewable. The usage of renewable energy sources is very important when considering the sustainability of the existing energy usage of the world. While there is currently an abundance of non-renewable energy sources, such as nuclear fuels, these energy sources are depleting. In addition to being a non-renewable supply, the non-renewable energy sources release emissions into the air, which has an adverse effect on the environment.

Get My Free Ebook

Post a comment