## A32 Overview On sampling principles

Sampling infers information about an entire population by observing a fraction of it: the sample (see Figure 3A.3.1). For example, changes of carbon in tree biomass at regional or national levels can be estimated from the growth, mortality and cuttings of trees on a limited number of sample plots. Sampling theory then provides the means for scaling up the information from the sample plots to the selected geographical level. Properly designed sampling can greatly increase efficiency in the use of inventory resources. Furthermore, field sampling is generally needed in developing inventories because, even if remote sensing data provide complete territorial coverage, there will be a need for ground-based data from sample sites for interpretation and verification.

Figure 3A.3.1 Principle of sampling

Selection

Figure 3A.3.1 Principle of sampling

Selection

Standard sampling theory relies on random selection of a sample from the population; each unit in the population has a specific probability of being included in the sample. This is the case when sample plots have been distributed entirely at random within an area, or when plots have been distributed in a systematic grid system as long as the positioning of the grid is random. Random sampling reduces the risk of bias and allows for an objective assessment of the uncertainty of the estimates. Therefore, randomly sampled data generally should be used where available, or when setting up new surveys.

Samples may also be taken at subjectively chosen locations, which are assumed to be representative for the population. This is called subjective (or purposive) sampling and data from such surveys are often used in greenhouse gas inventories (i.e., when observations from survey sites that were not selected randomly are used to represent an entire land category or strata). Under these conditions, observations about, for example, forest type might be extrapolated to areas for which they are not representative. However, due to limited resources greenhouse gas inventories may need to make use of data also from subjectively selected sites or research plots. In this case, it is good practice to identify, in consultation with the agencies responsible for the sites or plots, the land areas for which the subjective samples can be regarded as representative.

### 3A.3.3 SAMPLING DESIGN

Sampling design determines how the sampling units (the sites or plots) are selected from the population and thus what statistical estimation procedures should be applied to make inferences from the sample. Random sampling designs can be divided into two main groups, depending on whether or not the population is stratified (i.e., subdivided before sampling) using auxiliary information. Stratified surveys will generally be more efficient in terms of what accuracy can be achieved at a certain cost. On the other hand, they tend to be slightly more complex, which increases the risk of non-sampling errors due to incorrect use of the collected data. Sampling designs should aim for a good compromise between simplicity and efficiency, and this can be promoted by following three aspects as set out below:

• Use of auxiliary data and stratification;

• Systematic sampling;

• Permanent sample plots and time-series data.

### Use of auxiliary data and stratification

One of the most important sampling designs which incorporate auxiliary information is stratification, whereby the population is divided into subpopulations on the basis of auxiliary data. These data may consist of knowledge of legal, administrative boundaries or boundaries of forest administrations which will be efficient to sample separately, or maps or remote sensing data distinguishing between upland and lowland areas or between different ecosystem types. Since stratification is intended to increase efficiency, it is good practice to use auxiliary data when such data are available or can be made available at low additional cost.

Stratification increases efficiency in two main ways: (i) by improving the accuracy of the estimate for the entire population; and (ii) by ensuring that adequate results are obtained for certain subpopulations, e.g., for certain administrative regions.

On the first issue, stratification increases sampling efficiency if a sub-division of the population is made so that the variability between units within a stratum is reduced as compared to the variability within the entire population. For example, a country may be divided into a lowland region (with certain features of the land-use categories of interest) and an upland region (with different features of the corresponding categories). If each stratum is homogeneous a precise overall estimate can be obtained using only a limited sample from each stratum. The second issue is important for purposes of providing results at a specific degree of accuracy for all administrative regions of interest, but also in case sampled data are to be used together with other existing datasets, which have been collected using different protocols with the same administrative or legal boundaries.

Use of remote sensing or map data for identifying the boundaries of the strata (the land-use class sub-divisions to be included in a sample survey) can introduce errors where some areas may be incorrectly classified as belonging to the stratum whilst other areas that do belong to the specific class are missed. Errors of this kind can lead to substantial bias in the final estimates, since the area identified for sampling will then not correspond to the target population. Whenever there is an obvious risk that errors of this kind may occur, it is good practice to make an assessment of the potential impact of such errors using ground truth data.

When data for the reporting of greenhouse gas emissions or removals are taken from existing large-scale inventories, such as national forest inventories, it is convenient to apply the standard estimation procedures of that inventory, as long as they are based on sound statistical principles. In addition, post-stratification (i.e., defining strata based on remote sensing or map auxiliary data after the field survey has been conducted) means that it may be possible to use new auxiliary data to increase efficiency without changing the basic field design (Dees et al., 1998). Using this estimation principle, the risk for bias pointed out in the previous paragraph also can be reduced.

### Systematic sampling

Sample based forest or land-use surveys generally make use of sample points or plots on which the characteristics of interest are recorded. One important issue here regards the layout of these points or plots. It is often appropriate to allocate the plots in small clusters in order to minimise travel costs when covering large areas with a sample based survey. With cluster sampling, the distance between plots should be large enough to avoid major between-plot correlation, taking (for forest sampling) stand size into account. An important issue is whether plots (or clusters of plots) should be laid out entirely at random or systematically using a regular grid, which is randomly located over the area of interest (see Figure 3A.3.2). In general, it is efficient to use systematic sampling, since in most cases this will increase the precision of the estimates. Systematic sampling also simplifies the fieldwork.

Figure 3A.3.2 Simple random layout of plots (left) and systematic layout (right)

Somewhat simplified, the reason why systematic random sampling generally is superior to simple random sampling is that sample plots will be distributed evenly to all parts of the target area.3 With simple random sampling, some parts of an area may have many plots while other parts will not have any plots at all.

### Permanent sample plots and time-series data

Greenhouse gas inventories must assess both current state and changes over time (e.g., in areas of land-use categories and carbon stocks). Assessment of changes is most important and it involves repeated sampling over time. The time interval between measurements should be determined based on the frequency of the events that cause changes, and also on the reporting requirements. Generally, sampling intervals of 5-10 years are adequate, and in many countries data from well designed surveys are already available for many decades, especially in the forest sector. Nevertheless, since estimates for the reporting are required on an annual basis, interpolation and extrapolation methods will need to be applied. Where sufficiently long time-series are not available, it may be necessary to extrapolate backwards in time to capture the dynamics of carbon stock changes.

When undertaking repeated sampling, the required data regarding the current state of areas or carbon stocks are assessed on each occasion. Changes are then estimated by calculating the difference between the state at time (t + 1) from the state at time t. Three common sampling designs can be used for change estimation:

• The same sampling units are used on both occasions (permanent sampling units);

• Different, independent sets of sampling units are used on both occasions (temporary sampling units);

• Some sampling units can be replaced between occasions while others remain the same (sampling with partial replacement).

Figure 3A.3.3 shows these three Approaches.

Figure 3A.3.3 Use of different configurations of permanent and temporary sampling units for estimating changes

 ♦ o ♦ 0 ♦ 0 ♦ 0 ♦
 m 0 ♦ s » ♦ 0 m

Identical set (permanent plots)

Independent sets (temporary plots)

Sampling unit measured at occasion 1 Sampling unit measured at occasion 2

Sampling with partial replacement (permanent and temporary plots)

3 In unusual cases when there is a regular pattern in the terrain that may coincide with the systematic grid system, systematic sampling may lead to less precise estimates than simple random sampling. However, such potential problems generally can be handled by orienting the grid system in another direction.

Permanent sample plots generally are more efficient in estimating changes than temporary plots because it is easier to distinguish actual trends from differences that are only due to changed plot selection. However, there are also some risks in the use of permanent sample plots. If the locations of permanent sample plots become known to land managers (e.g., by visibly marking the plots), there is a risk that management of the permanent plots will differ from the management of other areas. If this occurs, the plots will no longer be representative and there is an obvious risk that the results will be biased. If it is perceived that there might be a risk of the above kind, it is good practice to assess some temporary plots as a control sample in order to determine if the conditions on these plots deviate from the conditions on the permanent plots.

The use of sampling with partial replacement can address some of the potential problems with relying on permanent plots, because it is possible to replace sites that are believed to have been treated differently. Sampling with partial replacement may be used, although the estimation procedures are complicated (Scott and Kohl, 1994; Kohl et al., 1995).

When only temporary plots are used, overall changes still can be estimated but it will no longer be possible to study land-use conversions between different categories unless a time dimension can be introduced into the sample. This can be done by drawing on auxiliary data, for example maps, remote sensing or administrative records about the state of land in the past. This will introduce additional uncertainty into the assessment which it may be difficult to quantify other than by expert judgement.