Direct estimation of fertility from survey data containing birth histories

Available Data

Desired Result

Method

Description of method

The direct estimation of fertility (age-specific, and total) from survey data containing birth histories is relatively straightforward. If the data are carefully collected with a validated instrument (such as that used by the Demographic and Health Surveys), they can provide reliable and accurate estimates of fertility. However, distortions also frequently occur in birth history data, especially in relation to the shifting of births to more distant years to avoid additional questions on, for example, child health or anthropometry (Cleland 1996). These problems have again been highlighted recently by Schoumaker (2010, 2011). Displacement and omission of births might cause fertility (particularly in the period three to five years before the survey) to be underestimated.

Two approaches can be used to estimate fertility directly from data containing a detailed birth history. The first approach – that used by the DHS in its official reports – produces an estimate covering the one- or three-year period before the survey. (Three-year estimates are frequently used to avoid undesirable fluctuations in the estimates arising from the relatively small number of annual births in the DHS). This approach, is described in detail in the Guide to DHS Statistics (Rutstein and Rojas 2003). There are two disadvantages to it. First, if the survey is carried out over an extended period, it becomes impossible to locate the measure of fertility precisely in time. Second, the calculation of fertility rates is made more complex both by having to refer to the survey date and by working in five-year age groups and three-year periods of calendar time.

The simpler approach described here produces estimates of fertility for individual ages and calendar years of time. These can be very easily aggregated to produce estimates for wider age groups, or for periods of several years.

As with the DHS approach, initial manipulations have to be performed at a unit record level. For this reason, it makes sense in almost all circumstances, to estimate fertility directly from birth histories using the built-in survival time functionality of a statistical analysis program such as Stata. A useful routine for performing these calculations in Stata has been produced by Schoumaker (2013). However, the calculations are sufficiently straightforward to carry out using simple cross-tabulations of data. This section describes how.

Data requirements and assumptions

Data required

Two sets of data, both routinely produced at the data processing stage of a survey with detailed birth histories, are required. The first is a data set in which the unit of analysis is the woman – i.e. there is one record per woman. These data are required to estimate the denominator of the fertility rates. The second data set has the child as the unit of analysis – i.e. there is one record per child – but also includes essential information on the mother (crucially, her date of birth) in each record in the data set.

To estimate fertility, the following information must be present in the data.

a) Women’s data set

The month and year of each woman’s birth, derived if necessary from a century-month code (CMC).
The month and year of interview.
Any variables needed to adjust the data for the sampling design and sample weights.
Important covariates by which one might wish to assess differentials in fertility, bearing in mind that covariates at the date of interview may not have applied at the time the events of interest (recent births) took place.

b) Child’s data set

The child’s date of birth – month and year, derived if necessary from a CMC.
The mother’s date of birth – month and year, derived if necessary from a CMC.
Any variables needed to adjust the data for the sampling design and sample weights.
The same covariates by which differentials in fertility are to be assessed.

Caveats and warnings

While single-age fertility rates derived from relatively small-scale surveys provide some indication of the quality of the data, the rates are almost always too erratic to be of direct use. Aggregation into five-year groups (and then – perhaps – smoothing the rates by means of a relational Gompertz model) is almost always called for.
Similarly, rates for a single calendar year derived from survey data may not be reliable. Data for multiple calendar years should be combined to produce a more reliable estimate. However, ideally, one should not combine more than three years’ data to avoid flattening out the trend in fertility.
The rates produced using this approach may be affected materially by omission or displacement of the date of reported births.
The rates produced in this manner will not be the same as those produced by MeasureDHS. In the first place, the estimation of the period exposed to risk is a little different (MeasureDHS works in complete months, while here we work in half-months). Second, the reference period for the rates may differ by up to 11 months. One could, however, calculate rates for years running from July to June (and thus centred on 1 January, or indeed for any other 12-month period) by manipulating the numerator and denominator appropriately.

Application of method

We define the following terms:

$M_{B}^{c}$ - the child’s month of birth

$Y_{B}^{c}$ - the child’s year of birth

$M_{B}^{m}$ - the mother’s month of birth

$Y_{B}^{m}$ - the mother’s year of birth

$M_{I}^{}$ - the month in which the mother is interviewed

$Y_{I}^{}$ - the year in which the mother is interviewed

$B (x, t)$ - the total number of births to mothers aged x at the birth of their child in calendar year t

$E (x, t)$ - the person-years of exposure to risk of women aged x in calendar year t.

The rates are calculated by means of the following steps. To avoid having to make additional assumptions about the exposure to risk in the month of interview, both exposure and births occurring in the month of interview are ignored.

The general case is presented below where not all women are interviewed in the same calendar year. Where all women are interviewed in the same calendar year, the process can be simplified accordingly.

Step 1: Produce a tabulation of the number of births in each calendar year by the age of the mother at the birth of the child

This step produces the numerator of the fertility rates: births of children by calendar year and age of mother at birth.

In principle, the tabulation is relatively straightforward, although care needs to be taken to allocate appropriately mother’s age at the birth of her child when both mother and child have the same month of birth. If, as is usually the case, information on day of birth is not available, it is necessary to allocate the mother’s day of birth randomly to fall before or after the child’s day of birth. This could be implemented by generating a binary variable, b, using a random number generator, but doing so would have implications for the consistency and replicability of investigations. Instead, b can be generated from a putatively uniform variable that has no bearing on the outcomes being investigated, such as the day of the month in which the mother was interviewed. We therefore define b= 1 if the day of interview is greater than 15, and 0 if the day of the month is 15 or less.

The age (at last birthday) of the mother at the birth of a given child, x, is given by

$x = int (\frac{12 (Y_{B}^{c} - Y_{B}^{m}) + (M_{B}^{c} - M_{B}^{m} - b)}{12})$

where int() represents the integer portion of the term in brackets.

Extract a tabulation showing the total number of births in each cell defined by combinations of $Y_{B}^{c}$ and x, $B (x, t)$ weighting the data as appropriate, and making sure to exclude births that occurred in the month that the mother was interviewed.

Step 2: Calculate the age of each woman at the start of the year in which she was interviewed

Working with the women’s data set (i.e. with one record per woman), begin by deriving the age of women on 1 January of the year of interview, x_I, assuming that mothers’ births are uniformly distributed over calendar months (and hence occur, on average, half way through each month):

$x_{I} = int ((Y_{I}^{m} - Y_{B}^{m} - 1) + \frac{(12 - M_{B}^{m} + 0.5)}{12})$

It follows that the age of the mother on 1 January of any other year, t, (t ≤ Y_I) will be x_I - (Y_I- t).

Step 3: Calculate the exposed to risk for each woman in the year of her interview

In the calendar year in which she is interviewed, a woman is exposed to the risk of giving birth for only a portion of the year (that is, the portion before the interview takes place). In this case, the computation of exposure to risk depends critically on whether the interview took place before or after the woman’s birthday in that year. If her birth month precedes the interview month, she will be exposed to risk of giving birth at age x_Ifor

$E (x_{I}, Y_{I}) = \frac{M_{B}^{m} - 0.5}{12}$

years, and for

$E (x_{I} + 1, Y_{I}) = \frac{M_{I}^{} - M_{B}^{m} - 0.5}{12}$

years at age x_I+1. In contrast, if her birth month is the same as, or after, the month of her interview, her exposure to risk of giving birth in the year of interview will be for

$E (x_{I}, Y_{I}) = \frac{M_{I}^{} - 1}{12}$

years at age x_I, and

$E (x_{I} + 1, Y_{I}) = 0$

years at age x_I + 1.

Note that in the last complete year, aggregate exposure per woman is 1 year, whereas in the year of interview, aggregate exposure is (M_I - 1)/12 of a year, regardless of the relative timing of birth month and interview month.

Variables giving each woman’s exposure at ages x_I and x_I + 1 in the year of interview must be derived, and then aggregated (weighting were necessary) to produce a tabulation of aggregate exposure by age in the year of interview.

Step 4a: Calculate the exposure to risk for each woman in the last complete calendar year before her interview

In the last complete calendar year before each woman is interviewed, i.e. in year t=Y_I - 1, she will be aged x_I-1 until her birthday, and x_Ifor the remainder of the year. On the same assumption as above of a uniform distribution of births within calendar months, the fraction of a year from 1 January until each woman’s birthday is given by

$E (x_{I} - 1, Y_{I} - 1) = \frac{M_{B}^{m} - 0.5}{12}$

while for the remaining fraction of the year, she will be aged x_I with exposure

$E (x_{I}, Y_{I} - 1) = 1 - E (x_{I} - 1, Y_{I} - 1) = 1 - \frac{M_{B}^{m} - 0.5}{12}$ .

Using the two formulae above, variables giving each woman’s exposure at ages x_I and x_I + 1 in year Y_I - 1 must be derived, and then aggregated (weighting were necessary) to produce a tabulation of aggregate exposure by age in that year.

Step 4b: Derive the exposure for earlier complete calendar years

Birth histories are collected retrospectively from all women and each woman provides information for the entire period over which she has been exposed to the risk of childbearing. Some women may have moved between places or changed their other characteristics at some point during this period but, because complete residential and economic histories are seldom collected in fertility surveys, it is usually impossible to allow for this when calculating fertility rates. This means that the interpretation of some results such as fertility by place of residence becomes less clear.

However, since birthdays are immutable, and the population of women being assessed is constant over time, the aggregate exposure of women attaining age x in a year for which all women’s exposure is complete, v, will also equal the exposure of the cohort in earlier years, that is:

$E (x, v - 1) = E (x - 1, v - 2) = ... = E (x - k, v - k - 1)$

Step 5: Derive the age-specific fertility rates

The total exposure at each age in each calendar year, E(x,t), is derived by summing the tabulations derived in steps 3 and 4 for each age and for each calendar year (complete and incomplete). Note that if fieldwork extends over two calendar years, Y_I-1 will refer to two different years, as will Y_I. Total exposure in the final calendar year for which exposure might be derived will be based on only the partial exposure of women interviewed in the final calendar year of fieldwork, whereas total exposure in the immediately preceding year will be comprised of the partial exposure of women interviewed in the first year of fieldwork and the full exposure in that year of women interviewed in the final year of fieldwork.

The age-specific fertility rates for age x in year t are given by

$f_{x} (t) = \frac{B_{x} (t)}{E_{x} (t)}$

Age-specific fertility rates for conventional five-year age groups are derived by summing the births to women across each age group, and dividing by the sum of the exposure in that age group. Thus, if i=(x/5)-2 for x = 15, 20, …, 45, then

$f (1) =_{5} f_{15}; f (2) =_{5} f_{20}; ... f (7) =_{5} f_{45}$

and

$f (i, t) = \frac{\sum_{a = 5 i + 10}^{5 i + 14} B_{a} (t)}{\sum_{a = 5 i + 10}^{5 i + 14} E_{a} (t)}$

To combine data for multiple years, the numerators and denominators are summed separately before dividing to produce the rate:

$f (i, (t_{1}, t_{2})) = \frac{\sum_{z = t_{1}}^{t_{2}} \sum_{a = 5 i + 10}^{5 i + 14} B_{a} (z)}{\sum_{z = t_{1}}^{t_{2}} \sum_{a = 5 i + 10}^{5 i + 14} E_{a} (z)}$

Worked example

This example uses data from the 2004 Malawi DHS. Fieldwork in this survey began in earnest in October 2004 and ran through to February 2005.

Step 1: Produce a tabulation of the number of births in each calendar year by the age of the mother at the birth of the child

After random allocation of mother’s age at birth in cases where the mother and child’s month of birth are the same, the full cross tabulation of children’s year of birth by age of mother at the birth of her child is shown in Table 1. It would appear that there has been extreme shifting or omission of births in 2001 and 2002 in that the number of births reported in those years is some 20 per cent lower than that reported in 2003. Reported births in 2004 are lower than in 2003 in part because many women were not exposed for the full calendar year, and because births occurring in the month of interview are excluded from the analysis.

Table 1 Classification of births since 2001 by age of mother at birth, Malawi, 2004 DHS

	Year of birth
Age	2001	2002	2003	2004	2005
13	1.11	0.96	0.00	0.00	0.00
14	6.44	3.26	2.00	4.02	0.00
15	19.70	12.74	17.21	14.65	0.00
16	49.84	41.40	49.87	39.00	0.00
17	93.45	88.79	93.36	61.67	0.00
18	113.79	133.70	153.38	110.40	0.00
19	145.63	148.18	162.51	162.48	0.00
20	146.03	166.63	177.72	155.24	0.00
21	159.60	137.76	179.68	174.46	0.00
22	137.50	128.60	147.12	148.44	0.00
23	115.15	110.30	173.94	138.36	2.12
24	109.24	96.07	144.74	149.19	0.00
25	113.58	93.61	105.37	117.68	0.00
26	82.08	69.68	107.11	105.36	0.00
27	74.37	77.16	129.50	105.48	0.00
28	66.31	66.14	73.87	91.96	0.00
29	62.92	63.28	75.42	80.13	0.00
30	55.93	55.44	76.98	68.16	0.00
31	55.89	42.38	59.05	56.76	0.00
32	55.11	72.47	59.85	61.36	0.00
33	34.74	54.08	72.14	41.23	0.00
34	28.09	44.41	67.04	52.00	0.00
35	50.00	25.28	41.26	48.16	0.00
36	41.61	33.88	27.42	33.56	0.00
37	30.57	25.46	48.50	30.46	0.00
38	24.47	32.07	31.55	36.85	0.00
39	23.05	16.87	39.64	22.38	0.00
40	16.95	20.66	12.56	26.47	0.00
41	19.67	9.72	17.17	9.87	0.00
42	12.44	7.72	9.79	8.89	0.00
43	9.43	10.35	17.32	9.15	0.00
44	4.17	10.98	7.11	11.11	0.00
45	4.94	4.86	3.63	4.29	0.00
46	4.02	9.07	14.65	4.96	0.00
47	0.00	0.82	3.96	2.35	0.00
48	0.00	0.00	2.16	0.00	0.00
49	0.00	0.00	0.00	0.00	0.00
TOTAL	1967.84	1914.75	2404.58	2186.55	2.12

Step 2: Calculate the age of each woman at the start of the year in which she is interviewed

The age of women at the start of the year in which she is interviewed is derived from Equation 1. A sample extract is shown in Table 2. In the third line, the woman (case id 444 3) was born in August 1984 and interviewed in October 2004. On 1 January 2004 she would have been aged 19 (column 4). The woman with case id 528 2, in the ninth (penultimate) line of data, born in January 1970, interviewed in January 2005, and would have been aged 34 on 1 January 2005.

Table 2 Data showing derivation of exposure to risk, Malawi, 2004 DHS

				Exposure in year of interview		Exposure in last complete year
caseid	Date of birth	Date of interview	Age at start of year of interview	Lower age	Higher age	Lower age	Higher age
(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)
443 4	February 1976	October 2004	27	0.125	0.625	0.125	0.875
443 10	October 1974	October 2004	29	0.750	0.000	0.792	0.208
444 3	August 1984	October 2004	19	0.625	0.125	0.625	0.375
445 2	June 1983	October 2004	20	0.458	0.292	0.458	0.542
519 7	May 1989	January 2005	15	0.000	0.000	0.375	0.625
522 2	March 1979	January 2005	25	0.000	0.000	0.208	0.792
526 4	December 1989	January 2005	15	0.000	0.000	0.958	0.042
526 7	September 1979	January 2005	25	0.000	0.000	0.708	0.292
528 2	January 1970	January 2005	34	0.000	0.000	0.042	0.958
529 2	October 1972	January 2005	32	0.000	0.000	0.792	0.208

Step 3: Calculate the exposure to risk for each woman in the year of her interview

Columns (5) and (6) of Table 2 show the derivation of the exposure to risk for each woman in the year of her interview. The woman in the first line (case id 443 4) had her 28th birthday in February 2004. On the assumption that birthdays occur, on average, half-way through each month, she would have spent 0.125 (1.5 /12) aged 27 in 2004, and a further 0.625 of a year (7.5 months from the middle of February to the end of September, the month before she was interviewed) aged 28 in 2004.

The woman in the second line (case id 443 10) had her birthday in the same month she was interviewed. As a result, she experiences a full 9 months (0.75 of a year) exposure aged 29 in 2004, and has no exposure thereafter.

All women interviewed in January 2005 have no exposure in the year of interview, as we do not consider exposure (or births) that occur in that month.

Step 4a: Calculate the exposure to risk for each woman in the last complete calendar year before her interview

Columns (7) and (8) of Table 2 show the derivation of exposure to risk in the last complete year for which women were exposed to risk of giving birth in the survey data. For women interviewed in 2004, this would have been in 2003. For women interviewed in 2005, this would have been in 2004.

In the second case (case id 443 10), exposure in 2003 – her last complete year of exposure – would have been 9.5 months at age 28 and 2.5 months at age 29. As suggested by Equation 2, in previous years her exposure would have distributed similarly, at commensurately younger ages: in 2002, exposure would have been 9.5 months at age 27 and 2.5 months at age 28.

In the last case presented (case id 529 2), the woman would have spent approximately 9.5 months (0.792 of a year) aged 31 in 2004, and 2.5 months (0.208 of a year) aged 32 in 2004.

Aggregating exposure by single year of age and calendar year from Step 4 produces the exposure to risk shown in Table 3.

Table 3 Aggregate exposure by single year of age and calendar year, Malawi, 2004 DHS

Age	2002	2003	2004	2005
11	0.063	0.000	0.000	0.000
12	198.291	0.063	0.000	0.000
13	468.833	198.291	0.063	0.000
14	432.083	468.833	197.506	0.000
15	490.890	432.083	409.831	0.049
16	522.245	490.890	370.078	0.402
17	597.259	522.245	431.191	0.216
18	606.502	597.259	444.050	0.337
19	594.975	606.502	528.989	0.622
20	573.166	594.975	514.654	0.674
21	480.330	573.166	521.777	0.354
22	574.521	480.330	489.303	1.172
23	486.871	574.521	422.082	0.166
24	405.933	486.871	503.468	0.939
25	405.592	405.933	416.489	0.729
26	407.569	405.592	350.520	0.000
27	346.264	407.569	354.229	0.425
28	313.426	346.264	349.949	0.265
29	286.749	313.426	300.703	0.337
30	308.209	286.749	262.300	0.177
31	252.422	308.209	252.010	0.000
32	309.337	252.422	256.686	0.166
33	267.239	309.337	217.728	0.000
34	183.176	267.239	271.954	0.000
35	185.172	183.176	226.209	0.868
36	222.879	185.172	151.012	0.000
37	217.592	222.879	166.838	0.000
38	236.389	217.592	192.603	0.110
39	177.195	236.389	194.856	0.363
40	161.461	177.195	195.769	0.591
41	142.134	161.461	155.461	0.000
42	173.338	142.134	133.356	0.166
43	168.616	173.338	126.403	0.000
44	148.788	168.616	147.170	0.088
45	140.768	148.788	143.087	0.088
46	138.297	140.768	125.995	0.000
47	72.711	138.297	124.497	0.000
48	0.606	72.711	117.910	1.027
49	0.000	0.606	53.140	0.000
TOTAL	11697.89	11697.89	10119.87	10.330

Step 5: Derive the age-specific fertility rates

Single-year age-specific fertility rates for each calendar year are derived by dividing the births in Table 1 by the person-years exposed-to-risk in Table 3. The results are shown in Table 4.

Table 4 Age-specific fertility rates by single years of age and calendar year, Malawi, 2004 DHS

Age	2001	2002	2003	2004
11	0.000	0.000	0.000	0.000
12	0.000	0.000	0.000	0.000
13	0.003	0.002	0.000	0.000
14	0.013	0.008	0.004	0.020
15	0.038	0.026	0.040	0.036
16	0.083	0.079	0.102	0.105
17	0.154	0.149	0.179	0.143
18	0.191	0.220	0.257	0.249
19	0.254	0.249	0.268	0.307
20	0.304	0.291	0.299	0.302
21	0.278	0.287	0.313	0.334
22	0.282	0.224	0.306	0.303
23	0.284	0.227	0.303	0.328
24	0.269	0.237	0.297	0.296
25	0.279	0.231	0.260	0.283
26	0.237	0.171	0.264	0.301
27	0.237	0.223	0.318	0.298
28	0.231	0.211	0.213	0.263
29	0.204	0.221	0.241	0.266
30	0.222	0.180	0.268	0.260
31	0.181	0.168	0.192	0.225
32	0.206	0.234	0.237	0.239
33	0.190	0.202	0.233	0.189
34	0.152	0.242	0.251	0.191
35	0.224	0.137	0.225	0.213
36	0.191	0.152	0.148	0.222
37	0.129	0.117	0.218	0.183
38	0.138	0.136	0.145	0.191
39	0.143	0.095	0.168	0.115
40	0.119	0.128	0.071	0.135
41	0.114	0.068	0.106	0.064
42	0.074	0.045	0.069	0.067
43	0.063	0.061	0.100	0.072
44	0.030	0.074	0.042	0.075
45	0.036	0.035	0.024	0.030
46	0.055	0.066	0.104	0.039
47	0.000	0.011	0.029	0.019
48	0.000	0.000	0.030	0.000
49	0.000	0.000	0.000	0.000
Total Fertility	5.61	5.20	6.32	6.36

The data vary a lot between calendar years, with estimates of total fertility differing by more than a child per woman between 2002 and 2003. The estimate of total fertility in 2004, despite being derived from only partial exposure in that year for most women is highly consistent with the estimate for 2003. The shape of the distribution (as can be seen in Figure 1) is consistent across the three years, even measured in single years of age. This is true despite a high degree of variability in the estimates by single years of age even if they are aggregated over the three years from 2002 to 2004.

Figure 1 Age-specific fertility rates by single years of age and calendar year, Malawi 2004 DHS

Further aggregating the data into conventional five-year age groups produces the results shown in Table 5.

Table 5 Age-specific fertility rates by grouped year of age and calendar year, Malawi, 2004 DHS

Age group	2002	2003	2004	2002-4	DHS
15-19	0.151	0.180	0.178	0.169	0.162
20-24	0.254	0.304	0.312	0.290	0.293
25-29	0.210	0.261	0.283	0.252	0.254
30-34	0.204	0.235	0.222	0.221	0.222
35-39	0.129	0.180	0.184	0.164	0.163
40-44	0.075	0.078	0.086	0.080	0.080
45-49	0.042	0.049	0.021	0.036	0.035
Total Fertility	5.32	6.44	6.43	6.05	6.05
Note: 3 year rates as presented in the 2004 DHS report. Source: DHS StatCompiler

The differences in the last two columns between the ASFRs derived here and those reported in the DHS survey are very small. However, the much lower fertility rates for 2002 (and 2001, not shown) should give cause for concern about possible reference period errors and shifting of births.

References

Cleland J. 1996. "Demographic data collection in less developed countries", Population Studies 50(3):433-450. doi: https://dx.doi.org/10.1080/0032472031000149556

Rutstein S and G Rojas. 2003. Guide to DHS Statistics. Calverton, MD: ORC Macro.

Schoumaker B. 2010. "Reconstructing fertility trends in sub-Saharan Africa by combining multiple surveys affected by data quality problems " Paper presented at Population Association of America 2010 Annual Meeting. Dallas, TX, April 15-17, 2010.

Schoumaker B. 2011. "Omissions of births in DHS birth histories in sub-Saharan Africa: Measurement and determinants " Paper presented at Population Association of America 2011 Annual Meeting. Washington D.C., March 31 - April 2, 2011.

Schoumaker B. 2013. “A Stata module for computing fertility rates and TFRs from birth histories: tfr2”, Demographic Research 28(Article 38):1093–1144. doi: https://doi.org/10.4054/DemRes.2013.28.38

Author

Moultrie TA

Printer-friendly version
Log in to post comments

Suggested citation

Moultrie TA. 2013. Direct estimation of fertility from survey data containing birth histories. In Moultrie TA, Dorrington RE, Hill AG, Hill K, Timæus IM and Zaba B (eds). Tools for Demographic Estimation. Paris: International Union for the Scientific Study of Population. https://demographicestimation.iussp.org/content/direct-estimation-fertility-survey-data-containing-birth-histories. Accessed 2025-08-08.