The elBadry correction
Description of the method
The elBadry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.
Data required and assumptions
The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an outofrange code or a code for not answered or refused).
The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.
Caveats and warnings
The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed, the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.
Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in underestimated average parities, as not all women of unknown parity are indeed childless.
Application of method
We define
$$\text{\hspace{0.17em}}{N}_{i}^{}={}_{5}N{}_{a}^{}\text{\hspace{0.17em}}$$
for a = 15, 20, …, 45 and i=a/52, to be the number of women in age group i in the population. Thus, N_{1} represents the number of women aged 1519 in the population. Denote N_{i}_{,j} to be the number of women in age group i of parity j, and N_{i}_{,u} to be the number of women in age group i whose parity is unknown.
Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero
Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain N_{i}_{,j}. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce N_{i}_{,u}. The proportion of women in age group i with parity unknown is then
$${U}_{i}=\frac{{N}_{i,u}}{{N}_{i}}$$
The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by
$${Z}_{i}=\frac{{N}_{i,0}}{{N}_{i}}$$
If the U_{i} are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if P_{i} is the average parity of women in age group i,
$${P}_{i}=\frac{{\displaystyle \sum _{j=0}^{\omega}j.{N}_{i,j}}}{{\displaystyle \sum _{j=0}^{\omega}{N}_{i,j}}}$$
If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.
Step 2: Plot the points (Z_{i}, U_{i}) and evaluate the data
For the method to work correctly, the series of points (Z_{i}, U_{i}) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 1524 would result in the regression performing an outofsample extrapolation, the results of which could suggest illogical adjustments in these age groups.
If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation
$${P}_{i}=\frac{{\displaystyle \sum _{j=0}^{\omega}j.{N}_{i,j}^{}}}{{N}_{i}\text{\hspace{0.17em}}}$$
The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.
Step 3: Determine the slope and intercept of the best straight line fit to the data
The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Z_{i} against U_{i} applied to those data points selected for inclusion, that is,
$$\text{\hspace{0.17em}}{U}_{i}=\beta +\gamma {Z}_{i}\text{\hspace{0.17em}}$$
The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.
Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated
The adjusted proportion of women in age group i that is estimated to be truly childless is given by
$\text{\hspace{0.17em}}{Z}_{i}^{*}={Z}_{i}+{U}_{i}\beta \text{\hspace{0.17em}}$
That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by
$$\text{\hspace{0.17em}}{N}_{i,0}^{*}={N}_{i}\text{\hspace{0.17em}}\times \text{\hspace{0.17em}}{Z}_{i}^{*}\text{\hspace{0.17em}}$$
Thus, the estimated true proportion of women in each age group whose parity is unknown is given by
$$\text{\hspace{0.17em}}{N}_{i,u}^{*}={N}_{i}\text{\hspace{0.17em}}\times \text{\hspace{0.17em}}\beta \text{\hspace{0.17em}}$$
The
$$\text{\hspace{0.17em}}{N}_{i,j}^{*}\text{\hspace{0.17em}}$$
for other parities (j > 0) are unchanged.
Step 5: Calculation of average parities
If an elBadry correction has been applied to the data, the average parities are given by
$${P}_{i}=\frac{{\displaystyle \sum _{j=0}^{\omega}j.{N}_{i,j}^{*}}}{(1\beta ){N}_{i}\text{\hspace{0.17em}}}$$
embodying the assumption that the remaining women in age group i of unknown parity, βN_{i}, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.
Interpretation and checks
The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.
Occasionally, the method may have a contrary effect and suggest that the number of women with notstated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > U_{i}. If this is so, the correction should not be applied to that age group.
Worked example
The accompanying spreadsheet implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.
Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census

Age group (i) 


1519 
2024 
2529 
3034 
3539 
4044 
4549 

Parity 
1 
2 
3 
4 
5 
6 
7 

0 
597,560 
198,600 
59,400 
23,120 
14,580 
11,040 
9,560 

1 
134,700 
224,660 
83,140 
26,140 
13,620 
9,460 
7,740 

2 
38,120 
202,300 
120,940 
38,340 
19,180 
13,240 
9,280 

3 
11,120 
126,500 
150,500 
53,880 
28,020 
17,000 
12,440 

4 
6,820 
59,700 
146,500 
73,280 
37,340 
21,400 
14,800 

5 
1,740 
33,720 
102,300 
87,720 
48,140 
28,980 
18,560 

6 
0 
12,480 
58,980 
83,580 
56,520 
35,260 
26,280 

7 
0 
0 
57,180 
91,800 
56,240 
41,260 
28,640 

8 
0 
0 
0 
64,740 
56,560 
42,700 
32,920 

9 
0 
0 
0 
0 
40,780 
39,480 
33,000 

10 
0 
0 
0 
0 
26,840 
32,240 
27,920 

11 
0 
0 
0 
0 
14,920 
22,840 
21,920 

12 
0 
0 
0 
0 
8,280 
14,660 
14,720 

13 
0 
0 
0 
0 
3,740 
7,900 
8,920 

14 
0 
0 
0 
0 
2,180 
4,080 
4,900 

15 
0 
0 
0 
0 
1,260 
2,100 
2,860 

16 
0 
0 
0 
0 
960 
1,200 
1,540 

17 
0 
0 
0 
0 
520 
680 
1,000 

18 
0 
0 
0 
0 
420 
520 
620 

19 
0 
0 
0 
0 
140 
340 
380 

20 
0 
0 
0 
0 
160 
300 
280 

21 
0 
0 
0 
0 
240 
160 
280 

22 
0 
0 
0 
0 
40 
100 
60 

23 
0 
0 
0 
0 
20 
20 
80 

24 
0 
0 
0 
0 
60 
20 
80 

25 
0 
0 
0 
0 
60 
40 
0 

26 
0 
0 
0 
0 
60 
40 
80 

27 
0 
0 
0 
0 
80 
40 
60 

28 
0 
0 
0 
0 
20 
40 
40 

29 
0 
0 
0 
0 
20 
0 
40 

30 
0 
0 
0 
0 
340 
440 
360 

Not Stated 
402,780 
147,540 
61,920 
31,580 
20,240 
15,420 
12,960 

TOTAL 
1,192,840 
1,005,500 
840,860 
574,180 
451,580 
363,000 
292,320 
Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data. Thus reports of 2024 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 2529 are truncated at parity 7 (rather than parity 12) and those of 3034 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by reassigning to the unknown category reports of parity 19 and over for age group 3539, parity 23 and over in the age group 4044, and parity 26 and over in the last age group, 4549.
An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.
Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero
Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.
Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census

Age group (i) 


1519 
2024 
2529 
3034 
3539 
4044 
4549 
Parity 
1 
2 
3 
4 
5 
6 
7 
0 
597,560 
198,600 
59,400 
23,120 
14,580 
11,040 
9,560 
1 
134,700 
224,660 
83,140 
26,140 
13,620 
9,460 
7,740 
2 
38,120 
202,300 
120,940 
38,340 
19,180 
13,240 
9,280 
3 
11,120 
126,500 
150,500 
53,880 
28,020 
17,000 
12,440 
4 
6,820 
59,700 
146,500 
73,280 
37,340 
21,400 
14,800 
5 
1,740 
33,720 
102,300 
87,720 
48,140 
28,980 
18,560 
6 
0 
12,480 
58,980 
83,580 
56,520 
35,260 
26,280 
7 
0 
0 
57,180 
91,800 
56,240 
41,260 
28,640 
8 
0 
0 
0 
64,740 
56,560 
42,700 
32,920 
9 
0 
0 
0 
0 
40,780 
39,480 
33,000 
10 
0 
0 
0 
0 
26,840 
32,240 
27,920 
11 
0 
0 
0 
0 
14,920 
22,840 
21,920 
12 
0 
0 
0 
0 
8,280 
14,660 
14,720 
13 
0 
0 
0 
0 
3,740 
7,900 
8,920 
14 
0 
0 
0 
0 
2,180 
4,080 
4,900 
15 
0 
0 
0 
0 
1,260 
2,100 
2,860 
16 
0 
0 
0 
0 
960 
1,200 
1,540 
17 
0 
0 
0 
0 
520 
680 
1,000 
18 
0 
0 
0 
0 
420 
520 
620 
19 
0 
0 
0 
0 
0 
340 
380 
20 
0 
0 
0 
0 
0 
300 
280 
21 
0 
0 
0 
0 
0 
160 
280 
22 
0 
0 
0 
0 
0 
100 
60 
23 
0 
0 
0 
0 
0 
0 
80 
24 
0 
0 
0 
0 
0 
0 
80 
25 
0 
0 
0 
0 
0 
0 
0 
U 
402,780 
147,540 
61,920 
31,580 
21,480 
16,060 
13,540 
TOTAL 
1,192,840 
1,005,500 
840,860 
574,180 
451,580 
363,000 
292,320 
U_{i} 
0.338 
0.147 
0.074 
0.055 
0.048 
0.044 
0.046 
Z_{i} 
0.501 
0.198 
0.071 
0.040 
0.032 
0.030 
0.033 
The data include high proportions of women with parity not stated at ages 1519
$$\text{\hspace{0.17em}}\left(\frac{402,780}{1,192,840}=0.338\right)\text{\hspace{0.17em}}$$
2024 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Z_{i}) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an elBadry correction can be applied to the data.
Step 2: Plot the points (Z_{i}, U_{i}) on a set of axes and evaluate the data
The Z_{i} and U_{i}_{}are plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.
There is a clear linear relationship between the plotted points, and all points can be included in the application of an elBadry correction.
Step 3: Determine the slope and intercept of the best straight line fit
Performing a linear regression of the Z_{i} on the U_{i} for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.
Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated
The revised number of women of zero parity is given by
$$\text{\hspace{0.17em}}{N}_{i,0}^{*}={N}_{i}^{}({Z}_{i}+{U}_{i}\beta )\text{\hspace{0.17em}}$$
while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.
Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census

1519 
2024 
2529 
3034 
3539 
4044 
4549 
Revised parity not stated 
32,746 
27,603 
23,084 
15,763 
12,397 
9,965 
8,025 
Revised zero parity 
967,594 
318,537 
98,236 
38,937 
23,663 
17,135 
15,075 
For example, the number of women aged 2024 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 1519 is derived from 1,192,840 x (0.501 + 0.338  0.027) = 967,594.
Step 5: Calculation of average parities
Since an elBadry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.
Table 4 Corrected average parities by age group, Kenya, 1989 Census

1519 
2024 
2529 
3034 
3539 
4044 
4549 
Average parity 
0.242 
1.525 
3.214 
4.760 
6.239 
7.120 
7.510 
Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant,
$\text{\hspace{0.17em}}\frac{1}{1\beta}\text{\hspace{0.17em}}$
Detailed description of the method
The method is fully described in elBadry (1961). ElBadry’s fundamental insight was that, if it could be assumed that:
1) there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and
2) the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then
$${U}_{i}=\alpha {Z}_{i}^{*}+\beta $$
where αZ^{*}_{i} is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.
Hence, if αZ^{*}_{i} have been misclassified as not stated when they are truly childless, then
$$\text{\hspace{0.17em}}{Z}_{i}={Z}_{i}^{*}\alpha {Z}_{i}^{*}=(1\alpha ){Z}_{i}^{*}.\text{\hspace{0.17em}}$$
and therefore:
$${Z}_{i}^{*}=\frac{{Z}_{i}}{(1\alpha )}$$
and substituting this into Equation 3,
$${U}_{i}=\frac{\alpha}{1a}{Z}_{i}^{}+\beta =\gamma {Z}_{i}^{}+\beta $$
where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.
Thus, a regression of U_{i} on Z_{i}will give estimates of β (as well as γ and α).
From Equation 3, we then obtain
$$\text{\hspace{0.17em}}{U}_{i}\beta =\alpha {Z}_{i}^{*}={Z}_{i}^{*}{Z}_{i}^{}\text{\hspace{0.17em}}$$
and hence that
$$\text{\hspace{0.17em}}{Z}_{i}^{*}={N}_{i,0}^{*}={U}_{i}\beta +{Z}_{i}^{}\text{\hspace{0.17em}}$$
and
$$\text{\hspace{0.17em}}{U}_{i}^{*}=\beta {N}_{i}\text{\hspace{0.17em}}$$
Note that, even though we have two identities involving Z_{i}, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.
After deriving corrected values of Z^{*}_{i} and U^{*}_{i} , average parities can be calculated using Equation 2.
Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z^{*}_{i} can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.
A version of the correction designed for (the nowrare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).
References
elBadry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: http://dx.doi.org/10.1080/01621459.1961.10482134
UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. http://www.un.org/esa/population/techcoop/DemEst/manual10/manual10.html
 Printerfriendly version
 Log in or register to post comments