# Synthetic relational Gompertz models

## Description of method

The synthetic relatitional Gompertz model is an extension of the relational Gompertz method for the estimation of age-specific and total fertility and makes use of two sets of parity data, collected at different points in time, together with estimates of current fertility for the intervening period based on reports of recent births classified by age.

The method explicitly allows changes in fertility to be taken into account and is designed to be applied to censuses or surveys conducted either 5 or 10 years apart. In such circumstances, the survivors of a cohort of women at the first inquiry can be identified at the second, and the change in the average parity of the cohort can be calculated. The resulting sequence of parity increments for different cohorts during the period between the inquiries can then be cumulated to calculate average parities for a hypothetical cohort experiencing the fertility implied by the observed parity increments.

The period fertility rates that are compared with these synthetic cohort estimates should ideally refer to the entire period between the two inquiries that asked about lifetime fertility. One way to ensure this is to make use of data on registered births classified by age of mother for each calendar year of the period. If such data are available, all births recorded during the period for each age group can be calculated by addition over calendar years. Aver­age fertility rates for the period between the two inquiries can be obtained by divid­ing the births by the number of woman-years lived in each age group, estimated from the female population enumerated at the beginning and end of the period.

Where such data are not readily available, or are not reliable, a simpler, and generally adequate, procedure is to calculate age-specific fertility rates for the first and last years of the period, and to estimate the rates for the entire periods as the arithmetic mean of these two sets. If data on registered births are not available, but the two surveys or censuses gathered data on births in the past year, age-specific fertility rates for the period may by approximated in the same way by averaging the rates observed at the beginning and end of the period. If the births during the 12 months preceding each survey are tabulated by age of mother at the time of the survey, the observed fertility rates will correspond to age groups displaced by six months. The analysis will need to take this fact into account.

Once corresponding parities and fertility rates have been calculated for the period between the two inquiries, the cumulation and interpo­lation of the latter, and their comparison with the aver­age parities, are carried out exactly as described in the presentation of the conventional relational Gompertz model.

## Data required

The data required are:

• The number of children ever born classified by five-year age group of mother, taken from two surveys or censuses five or 10 years apart.
• EITHER the number of births during the year preceding each survey classified by five-year age group of mother OR registered births by five-year age group of mother for each inter-survey year. If data on births classified by age of mother are not available for the end-points of the inter-survey period, an appropriate age-specific fertility schedule referring approximately to the middle of the period could be used.
• The number of women in each five-year age group from both surveys or censuses.
• If the crude birth rate is to be calculated, or the relative completeness of the data from the vital registration system is to be assessed, the total popu­lation recorded by each survey or census.

## Assumptions

Most of the assumptions are those associated with the relational Gompertz model, namely:

• The standard fertility schedule chosen for use in the fitting procedure appropriately reflects the shape of the fertility distribution in the population.
• Any inter-survey changes in fertility have been smooth and gradual and have affected all age groups in a broadly similar way.
• Errors in the pre-adjustment fertility rates are proportionately the same for women in the central age groups (20-39), so that the age pattern of fertility described by reported births in the past year is reasonably accurate.
• The parities reported by younger women in their twenties are accurate.

The calculation of the synthetic cohort mean parities assumes that mortality and migration have no effect on actual parity distributions. In other words, it is assumed that the average parity of those women who die or migrate between the surveys is not significantly different from the average parity at com­parable ages of those women who are alive and present at the end of the period.

## Preparatory work and preliminary investigations

Before commencing analysis of fertility levels using this method, analysts should investigate the quality of the data at least in respect of the following dimensions

## Caveats and warnings

It is crucially important that the sets of fertility rates being averaged are consistent with respect to age classification before they are averaged. If they are not consistent initially, because one refers to age groups displaced by six months and the other does not, the former set should be adjusted (for example, by applying the F-only variant of the relational Gompertz model) before proceeding. In general, estimates of age-specific fertility rates from different sources (e.g. vital registration and census) should not be combined because of the different ways in which the schedules may be distorted.

If age-specific fertility rates for the end-points of the period are not available, a set of rates referring approximately to the mid-point of the period could be used. It should be remembered that only the pattern of the inter-survey age-specific fertility rates is important in applying the relational Gompertz method, so that if this pattern was more or less constant over the period, the exact reference date of the rates used does not matter.

If data on registered births are used, changes in completeness of the data by age group over time could distort the pattern of fertility. If this has been the case, the method should be applied with caution.

## Application of the method

The method is applied in the following steps.

#### Step 1: Calculation of reported average parities

Calculate the average parities,

${\text{\hspace{0.17em}}}_{5}{P}_{x}\left({t}_{1}\right)\text{\hspace{0.17em}}$

and

${\text{\hspace{0.17em}}}_{5}{P}_{x}\left({t}_{2}\right)\text{\hspace{0.17em}}$

of women in each age group [x,x + 5) for the two inquiries (t1 and t2), for x =15, 20 … 45. For ease of exposition, we denote the average parity in age group i at time t by

$\text{\hspace{0.17em}}P\left(i,t\right)=\text{\hspace{0.17em}}{}_{5}P{}_{x}\left(t\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}$

where i= (x/5-2). Thus, the average parities obtained from the first census or sur­vey are denoted by P(i,1), and those from the second survey by P(i,2).

#### Step 2: Calculation of average parities for a hypothetical cohort

The way in which the parities are calculated depends upon the length of the inter-survey interval.

#### a) Interval is of five years’ duration

If the interval between the two data series is five years, all the survivors of age group i at the first inquiry are in age group i + 1 at the second inquiry, and the parity increment between the inquiries for the corresponding cohort is equal to P(i+1,2)-P(i,1). Such increments can be calculated for each age group, and the hypothetical-cohort parities are then obtained by successively cumulating them. Thus, if the parity increment for the cohort of age group i at the first inquiry is denoted by

$ΔP(i+1)$

and the par­ity of age group i for the hypothetical cohort is denoted by P(i,s) (where the s stands for 'synthetic'), one has

$ΔP(i+1)=P(i+1,2)−P(i,1)$

for i=1…6, and hence

$P(i,s)= ∑ j=1 i ΔP(j)$

The parity increment

$ΔP(i+1)$

for the youngest age group (i = 0) is taken as being equal to P(1,2), i.e., assuming that P(0,1) – the average parity of women aged 10-14 in the first inquiry– is zero. If fertility is changing rapidly, this value of

$ΔP(1)$

will therefore reflect period rates somewhat closer to the second survey than to the mid-point of the interval, slightly over-allowing for the change in fertility.

#### b) Interval is of ten years’ duration

If the intercensal or inter-survey interval is 10 years, then the sur­vivors of the initial cohort of age group i in the first sur­vey will be the women in age group (i + 2) in the second. The hypothetical cohort parities are then obtained by cumulating two parallel sequences of parity increments. Once more, for the youngest age groups,

$ΔP(1)$

is taken as being equal to P(1,2) and

$ΔP(2)$

to P(2,2). Other parity increments are calculated as

$ΔP(i+2)=P(i+2,2)−P(i,1)$

for i=1…5.

Hypothetical cohort parities for even-numbered age groups are obtained by summing the parity increments for even-numbered age groups, whereas those for odd- numbered age groups are obtained by summing parity increments for odd-numbered age groups. Thus,

$P(1,s)=ΔP(1)=P(1,2) P(2,s)=ΔP(2)=P(2,2) P(3,s)=ΔP(1)+ΔP(3) P(4,s)=ΔP(2)+ΔP(4) P(5,s)=ΔP(1)+ΔP(3)+ΔP(5) P(6,s)=ΔP(2)+ΔP(4)+ΔP(6) P(7,s)=ΔP(1)+ΔP(3)+ΔP(5)+ΔP(7)$

#### Step 3: Calculation of the current fertility rates

The method of calculating this schedule, denoted by f(i), where i indexes the age groups as before, depends upon the data available.

#### a) Data from a vital registration system

One possible pro­cedure is to calculate age-specific fertility rates referring roughly to the first and last years of the period between the two inquiries using data on the reported number of births during the year preceding each inquiry. In such a case, for each inquiry one would divide the reported births for each five-year age group of mother by the reported number of women in the same age group and then obtain age-specific fertility rates for the intervening period by calculating the arithmetic mean of each pair of end-point rates.

Alternatively, if age-specific fertility rates are available from a vital registration system for the whole period, a mean age-specific fertility rate for the period for each age group could be used. Calculating this mean would involve summing the births reported for each age group of mother, and dividing by the person years lived (by averaging the size of the age groups at the beginning and end of the interval, and multiplying by the number of years in the period).

Age-specific fertility rates obtained from vital registration are, by definition, classified by age of mother at the time of the delivery of the child.

#### b) Data from the inquiries giving rise to the average parities in Step 2

If the data on fertility are to be drawn from women’s reports of recent fertility in the year before each of the surveys used to derive the average parities, the arithmetic mean of the two fertility schedules is still taken as the estimate of fertility in the intervening period. However, the schedule of fertility rates derived in this way applies to the six months before each survey, and hence the age classification of the rates must be adjusted to reflect the classification by age of mother at census, and not the birth of the child. This age shift in the rates must be taken into account in the application of the relational Gompertz model.

The process of fitting a relational Gompertz model to the data is exactly as described in the section on the model. The only points of difference to note are the following:

• The estimates apply to the mid-point of the period; that is either 2 ½ or 5 years before the second inquiry.
• The spreadsheet only allows for the conventional application of the relational Gompertz model, using the parities to set the level, and using the fertility schedule based on current fertility data for the intervening period to determine the shape of the fertility curve.
• If the data are classified by age of mother at the inquiry date (i.e. when the data on recent fertility are drawn from the census or survey that also provided the average parities, rather than from a vital registration system), the accompanying Excel workbook only allows for recent fertility data to be based on births reported in the 12 months preceding the census or survey.

The relevant steps are reproduced below.

#### Step 4: Choose the fertility standard to be used with the model

The default fertility standard is that produced by Booth, modified slightly by Zaba (1981). The standard is appropriate to high- and medium-fertility populations and is simply a normalized cumulated fertility schedule (i.e. with total fertility equal to one). The standard Ys(x) values, are determined by taking the gompits of the schedule. The standard parity values, Ys(i), are the gompits of the parities associated with the standard fertility schedule. The choice of standard determines the values of g() and e() used in the regression fitting procedures.

#### Step 5: Evaluate the plot of P-points and F-points

The plots of z(x) - e(x) against g(x), and z(i) - e(i) against g(i) on the same set of axes are then used as a diagnostic for identifying common errors and trends in the data, as discussed in the main article on the relational Gompertz model.

#### Step 6: Fit the model by selecting the points to be used

Initially, all points should be included in the model. The only exception is if the average parities in one age group are higher than the average parities in the next, in which case the gompit will be undefined and the model cannot be fitted using that point. (Such a situation cannot occur in a real cohort, but could arise in a synthetic cohort, either because of data error or during a time of rapidly changing fertility.)

If the parity and fertility data are internally consistent, the plots of z() - e() against g() should result in straight lines. Those P-points and F-points that cause each plot to deviate from a straight line should be excluded from the model. Ordinary linear regression (using least squares) is used to fit lines to the P-points and F-points, and to identify, sequentially, those points that do not fit neatly on a straight line. The intention is to seek the largest combination of P- and F-points that lie (almost) on the same line, and to use these to fit the model.

Points are selected for inclusion or exclusion using the following guidelines:

• A contiguous series of points must be included in the model. Sequentially, only the end-most points can be excluded. (The reason for this is that each point on the graph is the result of calculations involving the ratio of a pair of adjacent data values. If the analysis leads you to conclude that a data value is unreliable as a denominator of one of these ratios it is not logical to accept it as the numerator of the next ratio).
• P-points should be eliminated in preference to F-points. This is because the average parity data are generally more prone to age-specific errors than the fertility data.
• P-points which deviate clearly from the straight line based only on the other P-points, and F-points which deviate clearly from the straight line based only on the other F-points should be eliminated early on in the fitting process.
• P- and F-points at older ages should be eliminated in preference to those at younger ages since data at these ages are usually the least reliable and show the least consistency between lifetime and recent fertility. The exception to this relates to the data points for women under the age of 20. Small numbers of events, as is usual for these young women, frequently make the estimates of average parities or cumulated fertility unreliable.
• Where only a marginally worse fit is achieved with more points, this is to be preferred to a slightly better fit achieved with fewer points. The spreadsheet calculates the root mean squared error (RMSE)
$RMSE=\sqrt{\frac{\sum _{}^{}{\left(\left(z\left(\right)-e\left(\right)\right)-\left(\alpha +{\left(\beta -1\right)}^{2}\frac{c}{2}+\beta g\left(\right)\right)\right)}^{2}}{n}}$
from the points used to fit the model. This statistic can assist with determining the optimal number of data points to which to fit if there is uncertainty as to which of two competing models is better. In this case, one should choose the model with the lower RMSE.

#### Step 7: Assess the fitted parameters

The values of α and β that represent the best-fitting line joining the remaining P-points and F-points must be checked to confirm that they are not so far from their central values as to suggest that the standard chosen is inappropriate. A good fit is indicated if -0.3 < α < 0.3, and if 0.8 < β < 1.25.

If the parameters lie outside this range, one or both of the underlying data series are problematic or the standard is inappropriate. Experimentation with another standard or changing the selection of points should be done before proceeding further. If the parameters still lie outside the ranges above, the method should be regarded as inappropriate.

#### Step 8: Fitted ASFRs and total fertility

Having estimated the two parameters of the model, they can be applied to the standard values for the parities to obtain fitted values,

$Y(i)=α+β. Y s (i)$

These are then converted back into measures of the cumulative proportion of fertility achieved by age group i using the anti-gompit transformation. The anti-gompits based on the parity distributions indicate the proportion of fertility achieved by that age group. Dividing observed parity in each age group by these proportions produces a series of estimates of total fertility. Averaging these values across the sub-set of age groups that were used to estimate α and β gives the fitted estimate of total fertility,

$\text{\hspace{0.17em}}\stackrel{^}{T}\text{\hspace{0.17em}}$

Applying the same α and β to the standard gompits for the ages that divide conventional age groups (i.e. 20, 25… 50), applying the anti-gompit transformation, and multiplying by

$\text{\hspace{0.17em}}\stackrel{^}{T}\text{\hspace{0.17em}}$

produces a scaled cumulated fertility schedule. Differencing successive estimates of cumulated fertility and dividing by five produces the fitted fertility schedule for conventional age groups (15-19; 20-24 etc.) even if the data were initially classified with a half-year shift.

## Worked example

This example uses data collected in two Kenyan Censuses, a decade apart, in 1989 and 1999. Both censuses asked questions about births in the last year and lifetime fertility. The method has been implemented in an accompanying Excel workbook.

#### Step 1: Calculation of reported average parities

An el-Badry correction was applied to the data from the 1989 Census – its application to Kenya is described here. By contrast, the data from the 1999 Census had evidently been edited prior to release, and no missing data were indicated. The average parities from the two censuses are shown in the first two columns of Table 1. From these data, it would appear that the lifetime fertility of older women has fallen by around 0.6 of a child over that decade. However, the increase in lifetime fertility among younger women is somewhat surprising.

#### Step 2: Calculation of average parities for a hypothetical cohort

The intercensal interval is 10 years (from 1989 to 1999). We therefore use the routine described in Step 2 (b) to derive the cohort average parities, shown in the last column of Table 1.

Table 1 Average parities by age group, Kenya, 1989 and 1999 Censuses

 Age group 1989 1999 Hypothetical cohort parity P(i,s) 15-19 0.2416 0.2848 0.2848 20-24 1.5247 1.364 1.3640 25-29 3.2138 2.6073 2.6505 30-34 4.7602 4.1432 3.9825 35-39 6.239 5.3867 4.8234 40-44 7.1204 6.3818 5.6041 45-49 7.5103 6.9143 5.4987

As described at that step,

$ΔP(1) =P(1,2)=0.2848$

and

$ΔP(2) =P(2,2)=1.3640$

, while

$P(5,s) = ΔP(1)+ΔP(3)+ΔP(5)=0.2848+(2.6073−0.2416)+(5.3867−3.2138)=4.8234$

It appears that omissions of children ever born may have occurred at older ages, as the hypothetical cohort parity at the oldest age group is somewhat lower than that of women in the hypothetical inter-survey cohort aged 40-44.

#### Step 3: Calculation of current fertility rates

The data available are women’s reports of the month and year of their last birth in the year before each census. As described in the section on the evaluation of recent fertility data, these reports can be converted into estimates of age-specific and total fertility by assuming that all births reported in the census month occurred before the census date, and pro-rating the births in the census month one year before the census. Doing so produces the direct estimates of age-specific and total fertility shown in Table 2. The last column, the estimate of inter-survey fertility is derived by averaging the rates for 1989 and 1999 in each age group.

It is worth noting that the quality of reporting of fertility in the two censuses is poor. The levels of fertility implied by these data are substantially lower than those implied by the synthetic cohort parities, or from the value of total fertility of 5.3 children per woman obtained in the Demographic and Health Survey conducted in Kenya in 1993.

Table 2 Direct estimates of age-specific and total fertility, Kenya, 1989 and 1999 Censuses

 Age group 1989 1999 Average fertility 15-19 0.0679 0.1107 0.0893 20-24 0.2179 0.2381 0.2280 25-29 0.2309 0.2124 0.2217 30-34 0.1908 0.1728 0.1818 35-39 0.1458 0.1193 0.1326 40-44 0.0764 0.0583 0.0673 45-49 0.0351 0.0203 0.0277 Total Fertility 4.82 4.66 4.74

#### Step 4: Choose the fertility standard to be used with the model

The default fertility standard is that produced by Booth, modified slightly by Zaba (1981). No other peer-reviewed standard for female fertility exists.

#### Step 5: Evaluate the plot of P-points and F-points

We begin by fitting models using all the P- and F-points. The results are shown in the first plot on the Diagnostic plots sheet of the accompanying Excel workbook.

#### Step 6: Fit the model by selecting the points to be used

Following the guidelines set out above, points are sequentially removed from the model to achieve a greater congruence of the P-points and the F-points. The best fit is found using the P-points for ages 20-39 and the F-points for ages 20-44 (Figure 1).

Figure 1 Plot of z()-e() against g() after elimination of points, synthetic cohorts based on the 1989 and 1999 Kenyan census data

#### Step 7: Assess the fitted parameters

In this application, the fitted values of α (-0.0286) and β (1.0042) lie comfortably within the set range.

#### Step 8: Fitted ASFRs and total fertility

The total fertility implied by the fitted model is 5.56 children per woman (Table 3), and applies, approximately, to August 1994, the model having accommodated the shift in the data arising from the classification of mother’s age. This level of fertility is broadly consistent with the estimate of 5.3 children per woman from the 1993 Kenyan DHS, as well as with estimates arising from the application of the relational Gompertz method to each data set separately.

Table 3 Estimated fertility rates based on hypothetical parity increments, Kenya 1989-1999

 Age group ASFR 15-19 0.139 20-24 0.267 25-29 0.261 30-34 0.213 35-39 0.153 40-44 0.070 45-49 0.009 Total Fertility 5.56

## Detailed description of method

The method described here is – in effect – a variant of the relational Gompertz model that, instead of using parity and fertility data collected at one point in time, constructs an 'average' fertility schedule based on reports of current and lifetime fertility at two points in time. The mathematics of the relational Gompertz model is described fully here.