## Trend Removal

Figure 5.1a shows average yields over the study period. The most obvious feature of this time series, and time series of yields for most crops in most regions, is the highly significant positive trend with time. This trend results largely from improvements in technology, such as adoption of modern hybrid cultivars and increased use of fertilizer. Given that so much of yield variation between years in different parts of the record occurs because of technology differences, the effect of climate is difficult to discern from the raw yield data. For that reason, one nearly always performs a de-trending of the data to remove the influence of technology. There are several ways to do this, none of them clearly optimal. The first is to approximate the trend in technology with a polynomial fit, and take the yield anomalies from this trend. For most crops the technology trend can be approximated with a first order polynomial (linear trend).

Figure 5.1b illustrates the yield anomalies from a linear trend for the maize time series. The anomalies are much larger in absolute value for the latter part of the record, a common occurrence in yield time series. This change in variance from the beginning to end of the record, known as heteroskedasticity, violates some of the basic assumptions of many statistical techniques such as linear regression. To correct for this, yields are often expressed on a log basis, which means that anomalies represent percent differences from the trend line rather than absolute differences, since log (a) - log (b) = log (a/b). As shown in Fig. 5.1c, use of log yields rather than absolute yields removes most of the problem with heteroskedasticity.1

1 However, note that if the yield anomalies in Fig. 5.1b showed no sign of heteroskedasticity, then introducing the log transformation could lead to heteroskedasticity by suppressing values at the beginning of the record.

Yield (ton/ha)

Yield - Linear Trend

Yield (ton/ha)

1950 1970 1990

1950 1970 1990

1950 1970 1990

log(Yield), First Difference

1950

1970

1990

1950

1970

1990

Fig. 5.1 (a) Time series of maize yields in US for counties east of 100° W, shown with three common methods of detrending (b-d)

It is frequently the case that yield trends are obviously not linear, as demonstrated for two cases in Fig. 5.2. In this situation, fitting a linear trend may cause serious errors, and one can resort instead to higher order polynomials. A more flexible approach, and one that is commonly used in time series analysis, is to transform the data to first-differences as shown in Fig. 5.1d, where from each value one subtracts the value in the previous year. In this case, the subsequent analysis focuses only on year-to-year changes so that effects of long-term trends are minimized. Any predictor variables must then also be transformed to first-differences in order to compare with yields.

A final approach to account for technology is not to remove a trend, but rather to include a term for year (and possibly year-squared) in subsequent regression analysis. One could also include explicit technology proxies, such as fertilizer rate or percent of growers using modern cultivars.

Yield | |||

sensitivity | |||

(mean ± 1 s.d.% | |||

Response variable |
Predictor variable(s) |
Model R2 |
°C-1) |

Yield |
Avg. temperature and year |
0.92 |
-3.8 ± 2.0 |

Log (Yield) |
Avg. temperature and year |
0.90 |
-4.5 ± 2.5 |

Yield-Trend |
Avg. temperature |
0.06 |
-3.7 ± 1.9 |

Log (Yield)-Log (Trend) |
Avg. temperature |
0.10 |
-4.4 ± 1.9 |

Yield, first difference |
Avg. temperature, first difference |
0.16 |
-7.6 ± 2.4 |

Log (Yield), first |
Avg. temperature, first |
0.16 |
-6.8 ± 2.3 |

difference |
difference |

In summary, for any yield time series of considerable length, accounting for technology trends is essential, and many approaches exist toward this end. How important is this decision in the final analysis? Table 5.1 summarizes the results of simple linear regressions with average growing season (April-September) temperature as the predictor variable and various representations of yield as the response variable. The model R2 indicates that regressions using first differences tend to have higher explanatory power than those based on anomalies. Models that use raw yields and include a time trend have, of course, much higher R2 because the effect of technology has not been previously removed but is included in the model.

The key aspect of these models is the predicted response to temperature, which is expressed as the % change in yield for a 1°C increase. The results can vary by a factor of 2, with the smallest effect found when using raw yields with a time term, and the biggest effect using first differences of raw yields. Note that the effect of using log relative to absolute yields can either increase or reduce the model R2 and inferred yield sensitivity, while the effect of using first-differences tends to increase both in this example.

## Post a comment