The synthetic relatitional Gompertz model is an extension of the relational Gompertz method [1] for the estimation of age-specific and total fertility and makes use of two sets of parity data, collected at different points in time, together with estimates of current fertility for the intervening period based on reports of recent births classified by age.
The method explicitly allows changes in fertility to be taken into account and is designed to be applied to censuses or surveys conducted either 5 or 10 years apart. In such circumstances, the survivors of a cohort of women at the first inquiry can be identified at the second, and the change in the average parity of the cohort can be calculated. The resulting sequence of parity increments for different cohorts during the period between the inquiries can then be cumulated to calculate average parities for a hypothetical cohort experiencing the fertility implied by the observed parity increments.
The period fertility rates that are compared with these synthetic cohort estimates should ideally refer to the entire period between the two inquiries that asked about lifetime fertility. One way to ensure this is to make use of data on registered births classified by age of mother for each calendar year of the period. If such data are available, all births recorded during the period for each age group can be calculated by addition over calendar years. Average fertility rates for the period between the two inquiries can be obtained by dividing the births by the number of woman-years lived in each age group, estimated from the female population enumerated at the beginning and end of the period.
Where such data are not readily available, or are not reliable, a simpler, and generally adequate, procedure is to calculate age-specific fertility rates for the first and last years of the period, and to estimate the rates for the entire periods as the arithmetic mean of these two sets. If data on registered births are not available, but the two surveys or censuses gathered data on births in the past year, age-specific fertility rates for the period may by approximated in the same way by averaging the rates observed at the beginning and end of the period. If the births during the 12 months preceding each survey are tabulated by age of mother at the time of the survey, the observed fertility rates will correspond to age groups displaced by six months. The analysis will need to take this fact into account.
Once corresponding parities and fertility rates have been calculated for the period between the two inquiries, the cumulation and interpolation of the latter, and their comparison with the average parities, are carried out exactly as described in the presentation of the conventional relational Gompertz model [1].
The data required are:
Most of the assumptions are those associated with the relational Gompertz model, namely:
The calculation of the synthetic cohort mean parities assumes that mortality and migration have no effect on actual parity distributions. In other words, it is assumed that the average parity of those women who die or migrate between the surveys is not significantly different from the average parity at comparable ages of those women who are alive and present at the end of the period.
Before commencing analysis of fertility levels using this method, analysts should investigate the quality of the data at least in respect of the following dimensions
It is crucially important that the sets of fertility rates being averaged are consistent with respect to age classification before they are averaged. If they are not consistent initially, because one refers to age groups displaced by six months and the other does not, the former set should be adjusted (for example, by applying the F-only variant of the relational Gompertz model [1]) before proceeding. In general, estimates of age-specific fertility rates from different sources (e.g. vital registration and census) should not be combined because of the different ways in which the schedules may be distorted.
If age-specific fertility rates for the end-points of the period are not available, a set of rates referring approximately to the mid-point of the period could be used. It should be remembered that only the pattern of the inter-survey age-specific fertility rates is important in applying the relational Gompertz method, so that if this pattern was more or less constant over the period, the exact reference date of the rates used does not matter.
If data on registered births are used, changes in completeness of the data by age group over time could distort the pattern of fertility. If this has been the case, the method should be applied with caution.
The method is applied in the following steps.
Calculate the average parities,
and
of women in each age group [x,x + 5) for the two inquiries (t_{1} and t_{2}), for x =15, 20 … 45. For ease of exposition, we denote the average parity in age group i at time t by
where i= (x/5-2). Thus, the average parities obtained from the first census or survey are denoted by P(i,1), and those from the second survey by P(i,2).
The way in which the parities are calculated depends upon the length of the inter-survey interval.
If the interval between the two data series is five years, all the survivors of age group i at the first inquiry are in age group i + 1 at the second inquiry, and the parity increment between the inquiries for the corresponding cohort is equal to P(i+1,2)-P(i,1). Such increments can be calculated for each age group, and the hypothetical-cohort parities are then obtained by successively cumulating them. Thus, if the parity increment for the cohort of age group i at the first inquiry is denoted by
and the parity of age group i for the hypothetical cohort is denoted by P(i,s) (where the s stands for 'synthetic'), one has
for i=1…6, and hence
The parity increment
for the youngest age group (i = 0) is taken as being equal to P(1,2), i.e., assuming that P(0,1) – the average parity of women aged 10-14 in the first inquiry– is zero. If fertility is changing rapidly, this value of
will therefore reflect period rates somewhat closer to the second survey than to the mid-point of the interval, slightly over-allowing for the change in fertility.
If the intercensal or inter-survey interval is 10 years, then the survivors of the initial cohort of age group i in the first survey will be the women in age group (i + 2) in the second. The hypothetical cohort parities are then obtained by cumulating two parallel sequences of parity increments. Once more, for the youngest age groups,
is taken as being equal to P(1,2) and
to P(2,2). Other parity increments are calculated as
for i=1…5.
Hypothetical cohort parities for even-numbered age groups are obtained by summing the parity increments for even-numbered age groups, whereas those for odd- numbered age groups are obtained by summing parity increments for odd-numbered age groups. Thus,
The method of calculating this schedule, denoted by f(i), where i indexes the age groups as before, depends upon the data available.
One possible procedure is to calculate age-specific fertility rates referring roughly to the first and last years of the period between the two inquiries using data on the reported number of births during the year preceding each inquiry. In such a case, for each inquiry one would divide the reported births for each five-year age group of mother by the reported number of women in the same age group and then obtain age-specific fertility rates for the intervening period by calculating the arithmetic mean of each pair of end-point rates.
Alternatively, if age-specific fertility rates are available from a vital registration system for the whole period, a mean age-specific fertility rate for the period for each age group could be used. Calculating this mean would involve summing the births reported for each age group of mother, and dividing by the person years lived (by averaging the size of the age groups at the beginning and end of the interval, and multiplying by the number of years in the period).
Age-specific fertility rates obtained from vital registration are, by definition, classified by age of mother at the time of the delivery of the child.
If the data on fertility are to be drawn from women’s reports of recent fertility in the year before each of the surveys used to derive the average parities, the arithmetic mean of the two fertility schedules is still taken as the estimate of fertility in the intervening period. However, the schedule of fertility rates derived in this way applies to the six months before each survey, and hence the age classification of the rates must be adjusted to reflect the classification by age of mother at census, and not the birth of the child. This age shift in the rates must be taken into account in the application of the relational Gompertz model.
The process of fitting a relational Gompertz model to the data is exactly as described in the section on the model [6]. The only points of difference to note are the following:
The relevant steps are reproduced below.
The default fertility standard is that produced by Booth, modified slightly by Zaba (1981). The standard is appropriate to high- and medium-fertility populations and is simply a normalized cumulated fertility schedule (i.e. with total fertility equal to one). The standard Y^{s}(x) values, are determined by taking the gompits of the schedule. The standard parity values, Y^{s}(i), are the gompits of the parities associated with the standard fertility schedule. The choice of standard determines the values of g() and e() used in the regression fitting procedures.
The plots of z(x) - e(x) against g(x), and z(i) - e(i) against g(i) on the same set of axes are then used as a diagnostic for identifying common errors and trends in the data, as discussed in the main article on the relational Gompertz model [6].
Initially, all points should be included in the model. The only exception is if the average parities in one age group are higher than the average parities in the next, in which case the gompit will be undefined and the model cannot be fitted using that point. (Such a situation cannot occur in a real cohort, but could arise in a synthetic cohort, either because of data error or during a time of rapidly changing fertility.)
If the parity and fertility data are internally consistent, the plots of z() - e() against g() should result in straight lines. Those P-points and F-points that cause each plot to deviate from a straight line should be excluded from the model. Ordinary linear regression (using least squares) is used to fit lines to the P-points and F-points, and to identify, sequentially, those points that do not fit neatly on a straight line. The intention is to seek the largest combination of P- and F-points that lie (almost) on the same line, and to use these to fit the model.
Points are selected for inclusion or exclusion using the following guidelines:
The values of α and β that represent the best-fitting line joining the remaining P-points and F-points must be checked to confirm that they are not so far from their central values as to suggest that the standard chosen is inappropriate. A good fit is indicated if -0.3 < α < 0.3, and if 0.8 < β < 1.25.
If the parameters lie outside this range, one or both of the underlying data series are problematic or the standard is inappropriate. Experimentation with another standard or changing the selection of points should be done before proceeding further. If the parameters still lie outside the ranges above, the method should be regarded as inappropriate.
Having estimated the two parameters of the model, they can be applied to the standard values for the parities to obtain fitted values,
These are then converted back into measures of the cumulative proportion of fertility achieved by age group i using the anti-gompit transformation. The anti-gompits based on the parity distributions indicate the proportion of fertility achieved by that age group. Dividing observed parity in each age group by these proportions produces a series of estimates of total fertility. Averaging these values across the sub-set of age groups that were used to estimate α and β gives the fitted estimate of total fertility,
Applying the same α and β to the standard gompits for the ages that divide conventional age groups (i.e. 20, 25… 50), applying the anti-gompit transformation, and multiplying by
produces a scaled cumulated fertility schedule. Differencing successive estimates of cumulated fertility and dividing by five produces the fitted fertility schedule for conventional age groups (15-19; 20-24 etc.) even if the data were initially classified with a half-year shift.
This example uses data collected in two Kenyan Censuses, a decade apart, in 1989 and 1999. Both censuses asked questions about births in the last year and lifetime fertility. The method has been implemented in an accompanying Excel workbook [7].
An el-Badry correction was applied to the data from the 1989 Census – its application to Kenya is described here [5]. By contrast, the data from the 1999 Census had evidently been edited prior to release, and no missing data were indicated. The average parities from the two censuses are shown in the first two columns of Table 1. From these data, it would appear that the lifetime fertility of older women has fallen by around 0.6 of a child over that decade. However, the increase in lifetime fertility among younger women is somewhat surprising.
The intercensal interval is 10 years (from 1989 to 1999). We therefore use the routine described in Step 2 (b) to derive the cohort average parities, shown in the last column of Table 1.
Table 1 Average parities by age group, Kenya, 1989 and 1999 Censuses
Age group |
1989 |
1999 |
Hypothetical cohort parity P(i,s) |
15-19 |
0.2416 |
0.2848 |
0.2848 |
20-24 |
1.5247 |
1.3640 |
1.3640 |
25-29 |
3.2138 |
2.6073 |
2.6505 |
30-34 |
4.7602 |
4.1432 |
3.9825 |
35-39 |
6.2390 |
5.3867 |
4.8234 |
40-44 |
7.1204 |
6.3818 |
5.6041 |
45-49 |
7.5103 |
6.9143 |
5.4987 |
As described at that step,
and
, while
It appears that omissions of children ever born may have occurred at older ages, as the hypothetical cohort parity at the oldest age group is somewhat lower than that of women in the hypothetical inter-survey cohort aged 40-44.
The data available are women’s reports of the month and year of their last birth in the year before each census. As described in the section on the evaluation of recent fertility data, these reports can be converted into estimates of age-specific and total fertility by assuming that all births reported in the census month occurred before the census date, and pro-rating the births in the census month one year before the census. Doing so produces the direct estimates of age-specific and total fertility shown in Table 2. The last column, the estimate of inter-survey fertility is derived by averaging the rates for 1989 and 1999 in each age group.
It is worth noting that the quality of reporting of fertility in the two censuses is poor. The levels of fertility implied by these data are substantially lower than those implied by the synthetic cohort parities, or from the value of total fertility of 5.3 children per woman obtained in the Demographic and Health Survey conducted in Kenya in 1993.
Table 2 Direct estimates of age-specific and total fertility, Kenya, 1989 and 1999 Censuses
Age group |
1989 |
1999 |
Average fertility |
15-19 |
0.0679 |
0.1107 |
0.0893 |
20-24 |
0.2179 |
0.2381 |
0.2280 |
25-29 |
0.2309 |
0.2124 |
0.2217 |
30-34 |
0.1908 |
0.1728 |
0.1818 |
35-39 |
0.1458 |
0.1193 |
0.1326 |
40-44 |
0.0764 |
0.0583 |
0.0673 |
45-49 |
0.0351 |
0.0203 |
0.0277 |
Total Fertility |
4.82 |
4.66 |
4.74 |
The default fertility standard is that produced by Booth, modified slightly by Zaba (1981). No other peer-reviewed standard for female fertility exists.
We begin by fitting models using all the P- and F-points. The results are shown in the first plot on the Diagnostic plots sheet of the accompanying Excel workbook.
Following the guidelines set out above, points are sequentially removed from the model to achieve a greater congruence of the P-points and the F-points. The best fit is found using the P-points for ages 20-39 and the F-points for ages 20-44 (Figure 1).
In this application, the fitted values of α (-0.0286) and β (1.0042) lie comfortably within the set range.
The total fertility implied by the fitted model is 5.56 children per woman (Table 3), and applies, approximately, to August 1994, the model having accommodated the shift in the data arising from the classification of mother’s age. This level of fertility is broadly consistent with the estimate of 5.3 children per woman from the 1993 Kenyan DHS, as well as with estimates arising from the application of the relational Gompertz method to each data set separately.
Table 3 Estimated fertility rates based on hypothetical parity increments, Kenya 1989-1999
Age group |
ASFR |
15-19 |
0.139 |
20-24 |
0.267 |
25-29 |
0.261 |
30-34 |
0.213 |
35-39 |
0.153 |
40-44 |
0.070 |
45-49 |
0.009 |
Total Fertility |
5.56 |
The method described here is – in effect – a variant of the relational Gompertz model that, instead of using parity and fertility data collected at one point in time, constructs an 'average' fertility schedule based on reports of current and lifetime fertility at two points in time. The mathematics of the relational Gompertz model is described fully here [1].
This method was described initially by Zlotnik and Hill (1981) and re-presented on pages 41-45 of Manual X (UN Population Division 1983). The write-up here remains true to the original formulation, with the exception that it is re-presented as a variant of the relational Gompertz model where the parities used are the inter-censal parities derived from the two surveys, and the fertility rates are the inter-survey estimates.
UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html [9]
Zaba B. 1981. Use of the Relational Gompertz Model in Analysing Fertility Data Collected in Retrospective Surveys. Centre for Population Studies Research Paper 81-2. London: Centre for Population Studies, London School of Hygiene & Tropical Medicine.
Zlotnik H and KH Hill. 1981. "The use of hypothetical cohorts in estimating demographic parameters under conditions of changing fertility and mortality", Demography 18(1):103-122. doi: http://dx.doi.org/10.2307/2061052 [10]
Links:
[1] http://demographicestimation.iussp.org/content/relational-gompertz-model
[2] http://demographicestimation.iussp.org/../../../../../../../../content/general-assessment-age-and-sex-data
[3] http://demographicestimation.iussp.org/content/assessment-recent-fertility-data
[4] http://demographicestimation.iussp.org/content/assessment-parity-data
[5] http://demographicestimation.iussp.org/../../../../../../../../content/el-badry-correction
[6] http://demographicestimation.iussp.org/../../../../../../../../content/relational-gompertz-model
[7] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/FE_SyntheticRG_2.xlsx
[8] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/FE_SYNTHRG_01_0.png
[9] http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html
[10] http://dx.doi.org/10.2307/2061052
[11] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/FE_SyntheticRG_2_FR.xlsx