The el-Badry correction
Description of the method
The el-Badry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.
Data required and assumptions
The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an out-of-range code or a code for not answered or refused).
The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.
Caveats and warnings
The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed, the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.
Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in under-estimated average parities, as not all women of unknown parity are indeed childless.
Application of method
We define
for a = 15, 20, …, 45 and i=a/5-2, to be the number of women in age group i in the population. Thus, N1 represents the number of women aged 15-19 in the population. Denote Ni,j to be the number of women in age group i of parity j, and Ni,u to be the number of women in age group i whose parity is unknown.
Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero
Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain Ni,j. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce Ni,u. The proportion of women in age group i with parity unknown is then
The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by
If the Ui are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if Pi is the average parity of women in age group i,
If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.
Step 2: Plot the points (Zi, Ui) and evaluate the data
For the method to work correctly, the series of points (Zi, Ui) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 15-24 would result in the regression performing an out-of-sample extrapolation, the results of which could suggest illogical adjustments in these age groups.
If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation
The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.
Step 3: Determine the slope and intercept of the best straight line fit to the data
The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Zi against Ui applied to those data points selected for inclusion, that is,
The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.
Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated
The adjusted proportion of women in age group i that is estimated to be truly childless is given by
That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by
Thus, the estimated true proportion of women in each age group whose parity is unknown is given by
The >for other parities (j > 0) are unchanged.
Step 5: Calculation of average parities
If an el-Badry correction has been applied to the data, the average parities are given by
embodying the assumption that the remaining women in age group i of unknown parity, βNi, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.
Interpretation and checks
The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.
Occasionally, the method may have a contrary effect and suggest that the number of women with not-stated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > Ui. If this is so, the correction should not be applied to that age group.
Worked example
The accompanying spreadsheet implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.
Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census
|
Age group (i) |
||||||||||
|
15-19 |
20-24 |
25-29 |
30-34 |
35-39 |
40-44 |
45-49 |
||||
Parity |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
||||
0 |
597,560 |
198,600 |
59,400 |
23,120 |
14,580 |
11,040 |
9,560 |
||||
1 |
134,700 |
224,660 |
83,140 |
26,140 |
13,620 |
9,460 |
7,740 |
||||
2 |
38,120 |
202,300 |
120,940 |
38,340 |
19,180 |
13,240 |
9,280 |
||||
3 |
11,120 |
126,500 |
150,500 |
53,880 |
28,020 |
17,000 |
12,440 |
||||
4 |
6,820 |
59,700 |
146,500 |
73,280 |
37,340 |
21,400 |
14,800 |
||||
5 |
1,740 |
33,720 |
102,300 |
87,720 |
48,140 |
28,980 |
18,560 |
||||
6 |
0 |
12,480 |
58,980 |
83,580 |
56,520 |
35,260 |
26,280 |
||||
7 |
0 |
0 |
57,180 |
91,800 |
56,240 |
41,260 |
28,640 |
||||
8 |
0 |
0 |
0 |
64,740 |
56,560 |
42,700 |
32,920 |
||||
9 |
0 |
0 |
0 |
0 |
40,780 |
39,480 |
33,000 |
||||
10 |
0 |
0 |
0 |
0 |
26,840 |
32,240 |
27,920 |
||||
11 |
0 |
0 |
0 |
0 |
14,920 |
22,840 |
21,920 |
||||
12 |
0 |
0 |
0 |
0 |
8,280 |
14,660 |
14,720 |
||||
13 |
0 |
0 |
0 |
0 |
3,740 |
7,900 |
8,920 |
||||
14 |
0 |
0 |
0 |
0 |
2,180 |
4,080 |
4,900 |
||||
15 |
0 |
0 |
0 |
0 |
1,260 |
2,100 |
2,860 |
||||
16 |
0 |
0 |
0 |
0 |
960 |
1,200 |
1,540 |
||||
17 |
0 |
0 |
0 |
0 |
520 |
680 |
1,000 |
||||
18 |
0 |
0 |
0 |
0 |
420 |
520 |
620 |
||||
19 |
0 |
0 |
0 |
0 |
140 |
340 |
380 |
||||
20 |
0 |
0 |
0 |
0 |
160 |
300 |
280 |
||||
21 |
0 |
0 |
0 |
0 |
240 |
160 |
280 |
||||
22 |
0 |
0 |
0 |
0 |
40 |
100 |
60 |
||||
23 |
0 |
0 |
0 |
0 |
20 |
20 |
80 |
||||
24 |
0 |
0 |
0 |
0 |
60 |
20 |
80 |
||||
25 |
0 |
0 |
0 |
0 |
60 |
40 |
0 |
||||
26 |
0 |
0 |
0 |
0 |
60 |
40 |
80 |
||||
27 |
0 |
0 |
0 |
0 |
80 |
40 |
60 |
||||
28 |
0 |
0 |
0 |
0 |
20 |
40 |
40 |
||||
29 |
0 |
0 |
0 |
0 |
20 |
0 |
40 |
||||
30 |
0 |
0 |
0 |
0 |
340 |
440 |
360 |
||||
Not Stated |
402,780 |
147,540 |
61,920 |
31,580 |
20,240 |
15,420 |
12,960 |
||||
TOTAL |
1,192,840 |
1,005,500 |
840,860 |
574,180 |
451,580 |
363,000 |
292,320 |
Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data. Thus reports of 20-24 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 25-29 are truncated at parity 7 (rather than parity 12) and those of 30-34 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by re-assigning to the unknown category reports of parity 19 and over for age group 35-39, parity 23 and over in the age group 40-44, and parity 26 and over in the last age group, 45-49.
An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.
Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero
Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.
Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census
|
Age group (i) |
||||||
|
15-19 |
20-24 |
25-29 |
30-34 |
35-39 |
40-44 |
45-49 |
Parity |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
0 |
597,560 |
198,600 |
59,400 |
23,120 |
14,580 |
11,040 |
9,560 |
1 |
134,700 |
224,660 |
83,140 |
26,140 |
13,620 |
9,460 |
7,740 |
2 |
38,120 |
202,300 |
120,940 |
38,340 |
19,180 |
13,240 |
9,280 |
3 |
11,120 |
126,500 |
150,500 |
53,880 |
28,020 |
17,000 |
12,440 |
4 |
6,820 |
59,700 |
146,500 |
73,280 |
37,340 |
21,400 |
14,800 |
5 |
1,740 |
33,720 |
102,300 |
87,720 |
48,140 |
28,980 |
18,560 |
6 |
0 |
12,480 |
58,980 |
83,580 |
56,520 |
35,260 |
26,280 |
7 |
0 |
0 |
57,180 |
91,800 |
56,240 |
41,260 |
28,640 |
8 |
0 |
0 |
0 |
64,740 |
56,560 |
42,700 |
32,920 |
9 |
0 |
0 |
0 |
0 |
40,780 |
39,480 |
33,000 |
10 |
0 |
0 |
0 |
0 |
26,840 |
32,240 |
27,920 |
11 |
0 |
0 |
0 |
0 |
14,920 |
22,840 |
21,920 |
12 |
0 |
0 |
0 |
0 |
8,280 |
14,660 |
14,720 |
13 |
0 |
0 |
0 |
0 |
3,740 |
7,900 |
8,920 |
14 |
0 |
0 |
0 |
0 |
2,180 |
4,080 |
4,900 |
15 |
0 |
0 |
0 |
0 |
1,260 |
2,100 |
2,860 |
16 |
0 |
0 |
0 |
0 |
960 |
1,200 |
1,540 |
17 |
0 |
0 |
0 |
0 |
520 |
680 |
1,000 |
18 |
0 |
0 |
0 |
0 |
420 |
520 |
620 |
19 |
0 |
0 |
0 |
0 |
0 |
340 |
380 |
20 |
0 |
0 |
0 |
0 |
0 |
300 |
280 |
21 |
0 |
0 |
0 |
0 |
0 |
160 |
280 |
22 |
0 |
0 |
0 |
0 |
0 |
100 |
60 |
23 |
0 |
0 |
0 |
0 |
0 |
0 |
80 |
24 |
0 |
0 |
0 |
0 |
0 |
0 |
80 |
25 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
U |
402,780 |
147,540 |
61,920 |
31,580 |
21,480 |
16,060 |
13,540 |
TOTAL |
1,192,840 |
1,005,500 |
840,860 |
574,180 |
451,580 |
363,000 |
292,320 |
Ui |
0.338 |
0.147 |
0.074 |
0.055 |
0.048 |
0.044 |
0.046 |
Zi |
0.501 |
0.198 |
0.071 |
0.040 |
0.032 |
0.030 |
0.033 |
The data include high proportions of women with parity not stated at ages 15-19 , 20-24 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Zi) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an el-Badry correction can be applied to the data.
Step 2: Plot the points (Zi, Ui) on a set of axes and evaluate the data
The Zi and Ui are plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.
There is a clear linear relationship between the plotted points, and all points can be included in the application of an el-Badry correction.
Step 3: Determine the slope and intercept of the best straight line fit
Performing a linear regression of the Zi on the Ui for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.
Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated
The revised number of women of zero parity is given by
while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.
Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census
|
15-19 |
20-24 |
25-29 |
30-34 |
35-39 |
40-44 |
45-49 |
Revised parity not stated |
32,746 |
27,603 |
23,084 |
15,763 |
12,397 |
9,965 |
8,025 |
Revised zero parity |
967,594 |
318,537 |
98,236 |
38,937 |
23,663 |
17,135 |
15,075 |
For example, the number of women aged 20-24 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15-19 is derived from 1,192,840 x (0.501 + 0.338 - 0.027) = 967,594.
Step 5: Calculation of average parities
Since an el-Badry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.
Table 4 Corrected average parities by age group, Kenya, 1989 Census
|
15-19 |
20-24 |
25-29 |
30-34 |
35-39 |
40-44 |
45-49 |
Average parity |
0.242 |
1.525 |
3.214 |
4.760 |
6.239 |
7.120 |
7.510 |
Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant, .
Detailed description of the method
The method is fully described in el-Badry (1961). El-Badry’s fundamental insight was that, if it could be assumed that:
1) there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and
2) the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then
where αZ*i is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.
Hence, if αZ*i have been misclassified as not stated when they are truly childless, then
and therefore:
and substituting this into Equation 3,
where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.
Thus, a regression of Ui on Zi will give estimates of β (as well as γ and α).
From Equation 3, we then obtain
and hence that
and
Note that, even though we have two identities involving Zi, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.
After deriving corrected values of Z*i and U*i , average parities can be calculated using Equation 2.
Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z*i can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.
A version of the correction designed for (the now-rare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).
References
el-Badry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: https://dx.doi.org/10.1080/01621459.1961.10482134
UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/files/documents/2020/Jan/un_1983_manual_x_-_indirect_techniques_for_demographic_estimation.pdf
- Printer-friendly version
- Log in to post comments