The el-Badry correction

Available Data

Desired Result

Method

Description of the method

The el-Badry correction is a method for correcting errors in data on children ever born caused by the enumerator or respondent failing to record answers of ‘zero’ to questions on lifetime fertility and, instead, leaving the response blank. When this occurs, during data processing the response is coded as ‘missing’ or ‘unknown’, even though it was evident to the enumerator at the time of data collection that the correct answer was ‘zero’. The method apportions the number of women whose parity is recorded as ‘missing’ between those whose parity is regarded as being truly unknown, and those women who should have been recorded as childless but whose responses were left blank. It does this apportionment at an aggregate level and not on an individual basis.

Data required and assumptions

The method requires the number of children ever born, classified by age group of mother, including the count of women with missing data (i.e., where the field was left blank or contained an out-of-range code or a code for not answered or refused).

The method assumes that a constant proportion of women at each age truly did not state their lifetime fertility (i.e. parity) at the time of data collection. The balance of the women with unreported parities is assumed to be erroneously recorded as not stated when the women are, in fact, childless.

Caveats and warnings

The method relies on the existence of a linear relationship between the proportions of women whose parity is not stated, and that of women reported to be childless. If such a linear relationship is observed, the adjusted denominator used to calculate average parities should exclude those women whose parity (after correction) is still regarded as unknown. This reflects the implicit assumption that these women’s parity distribution is no different from those of women of the same age whose parity is known.

Where the data indicate that a correction is needed because of the large proportion of missing parity information but the method cannot be applied (for example, due to unavailability of data by age, or violation of the assumption of linearity), women of unknown parity should be included in the denominator used to determine average parities. This implicitly assumes that the parity of all such women is zero (i.e. that all women of unknown parity are childless). This will, of course, result in under-estimated average parities, as not all women of unknown parity are indeed childless.

Application of method

We define $N_{i}^{} = {}_{5}N_{a}^{}$ for a = 15, 20, …, 45 and i=a/5-2, to be the number of women in age group i in the population. Thus, N₁ represents the number of women aged 15-19 in the population. Denote N_i_,_j to be the number of women in age group i of parity j, and N_i_,_u to be the number of women in age group i whose parity is unknown.

Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero

Extract a table of reported children ever born (j) by women’s age group (i) from the census data to obtain N_i_,_j. Missing data on parity (i.e. blank fields and invalid codes) should be combined with codes for parity not stated for each age group to produce N_i_,_u. The proportion of women in age group i with parity unknown is then

$U_{i} = \frac{N_{i, u}}{N_{i}}$

The proportion of women in age group i who are reportedly childless (i.e. are of parity zero) is given by

$Z_{i} = \frac{N_{i, 0}}{N_{i}}$

If the U_i are small (less than 2 per cent in each age group), it is not worth applying the correction. In such a situation, average parities should be determined by assuming that the parity distribution of women with not stated parity is the same as that of women whose parity is known, by omitting the women with unstated parities from the denominator of the calculation. Thus, if P_i is the average parity of women in age group i,

$P_{i} = \frac{\sum_{j = 0}^{ω} j . N_{i, j}}{\sum_{j = 0}^{ω} N_{i, j}}$

If the proportions of women with parity not stated exceed 2 per cent, it is worth assessing whether the correction can be applied.

Step 2: Plot the points (Z_i, U_i) and evaluate the data

For the method to work correctly, the series of points (Z_i, U_i) should lie on, or very close to, a straight line. In some cases, curvature may be observed in the data points corresponding to either the oldest or the youngest ages. If the curvature affects the older ages only, even if it is quite extreme, it is acceptable to exclude the oldest, or two oldest, age groups from the fitting process and fit a straight line to the remaining points since the method has the greatest absolute impact on the proportions not stated at the youngest ages. If the curvature is most noticeable among the younger women, the method should not be used as exclusion of the data points relating to women aged 15-24 would result in the regression performing an out-of-sample extrapolation, the results of which could suggest illogical adjustments in these age groups.

If a strongly linear relationship cannot be identified, even after excluding one or two data points from older women, the method cannot be applied. In this situation, it is preferable to assume that all women of not stated parity are childless, and to include them in the denominator of the average parity calculation

$P_{i} = \frac{\sum_{j = 0}^{ω} j . N_{i, j}^{}}{N_{i}}$ (Equation 1)

The analytical report should note that this has been done, and that, therefore, the average parity values are liable to be underestimated.

Step 3: Determine the slope and intercept of the best straight line fit to the data

The slope (γ) and intercept (β) of the fitted line are found by means of linear regression of Z_i against U_i applied to those data points selected for inclusion, that is,

$U_{i} = β + γ Z_{i}$

The intercept (β), which is independent of age (i), is the estimate of the proportion of those women in each age group with unknown parity whose parity is deemed to be truly unknown, and not misreported.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The adjusted proportion of women in age group i that is estimated to be truly childless is given by

$Z_{i}^{*} = Z_{i} + U_{i} - β$

That is, the revised proportion of women of zero parity in any age group is the proportion actually recorded as being of zero parity together with the proportion of women in that age group of not stated parity less the estimated proportion of women whose parity is regarded as being truly unknown. The revised estimate of the number of childless women in age group i is given by

$N_{i, 0}^{*} = N_{i} \times Z_{i}^{*}$

Thus, the estimated true proportion of women in each age group whose parity is unknown is given by

$N_{i, u}^{*} = N_{i} \times β$

The $N_{i, j}^{*}$ >for other parities (j > 0) are unchanged.

Step 5: Calculation of average parities

If an el-Badry correction has been applied to the data, the average parities are given by

$P_{i} = \frac{\sum_{j = 0}^{ω} j . N_{i, j}^{*}}{(1 - β) N_{i}}$ (Equation 2)

embodying the assumption that the remaining women in age group i of unknown parity, βN_i, who are omitted from the denominator, have the same average parity as the women in age group i whose parity is known.

Interpretation and checks

The value of β shows the estimated proportion of women whose parity is truly not stated. Larger values of β are therefore associated with poorer quality data.

Occasionally, the method may have a contrary effect and suggest that the number of women with not-stated parity is understated, and that the number of women of reported parity zero should be reduced. Such a situation will arise if β > U_i. If this is so, the correction should not be applied to that age group.

Worked example

The accompanying spreadsheet implements the method using data from the 1989 Kenya Census data obtained from IPUMS. The original data are presented in Table 1.

Table 1 Children ever born, by age group of mother at census date, Kenya, 1989 Census

	Age group (i)
	15-19		20-24		25-29		30-34	35-39	40-44		45-49
Parity	1		2		3		4	5	6		7
0		597,560		198,600		59,400	23,120	14,580		11,040	9,560
1		134,700		224,660		83,140	26,140	13,620		9,460	7,740
2		38,120		202,300		120,940	38,340	19,180		13,240	9,280
3		11,120		126,500		150,500	53,880	28,020		17,000	12,440
4		6,820		59,700		146,500	73,280	37,340		21,400	14,800
5		1,740		33,720		102,300	87,720	48,140		28,980	18,560
6		0		12,480		58,980	83,580	56,520		35,260	26,280
7		0		0		57,180	91,800	56,240		41,260	28,640
8		0		0		0	64,740	56,560		42,700	32,920
9		0		0		0	0	40,780		39,480	33,000
10		0		0		0	0	26,840		32,240	27,920
11		0		0		0	0	14,920		22,840	21,920
12		0		0		0	0	8,280		14,660	14,720
13		0		0		0	0	3,740		7,900	8,920
14		0		0		0	0	2,180		4,080	4,900
15		0		0		0	0	1,260		2,100	2,860
16		0		0		0	0	960		1,200	1,540
17		0		0		0	0	520		680	1,000
18		0		0		0	0	420		520	620
19		0		0		0	0	140		340	380
20		0		0		0	0	160		300	280
21		0		0		0	0	240		160	280
22		0		0		0	0	40		100	60
23		0		0		0	0	20		20	80
24		0		0		0	0	60		20	80
25		0		0		0	0	60		40	0
26		0		0		0	0	60		40	80
27		0		0		0	0	80		40	60
28		0		0		0	0	20		40	40
29		0		0		0	0	20		0	40
30		0		0		0	0	340		440	360
Not Stated		402,780		147,540		61,920	31,580	20,240		15,420	12,960
TOTAL		1,192,840		1,005,500		840,860	574,180	451,580		363,000	292,320

Inspection of the data reveals that they have been edited to disallow the recording of high parities in women aged less than 35. The editing rule applied at the preparatory stage would appear to be stricter than the one suggested in the section on evaluation of parity data. Thus reports of 20-24 year old women have been restricted to parity 6 or less (rather than parity 8), reports for those aged 25-29 are truncated at parity 7 (rather than parity 12) and those of 30-34 year olds at parity 8 (rather than 15). However, implausibly high parities have been allowed to remain at ages 35 and more. Therefore, further light editing of the data highlighted in italics in Table 1 could be undertaken by re-assigning to the unknown category reports of parity 19 and over for age group 35-39, parity 23 and over in the age group 40-44, and parity 26 and over in the last age group, 45-49.

An option can be selected on the Introduction tab of the spreadsheet to set implausible parities to ‘not stated’ prior to the application of the method.

Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero

Table 2 presents the revised data, together with the calculation of the proportions of women of parity zero, and parity not stated in each age group.

Table 2 Correction of parity data, and calculation of proportion of women of parity zero, and parity not stated, Kenya, 1989 Census

	Age group (i)
	15-19	20-24	25-29	30-34	35-39	40-44	45-49
Parity	1	2	3	4	5	6	7
0	597,560	198,600	59,400	23,120	14,580	11,040	9,560
1	134,700	224,660	83,140	26,140	13,620	9,460	7,740
2	38,120	202,300	120,940	38,340	19,180	13,240	9,280
3	11,120	126,500	150,500	53,880	28,020	17,000	12,440
4	6,820	59,700	146,500	73,280	37,340	21,400	14,800
5	1,740	33,720	102,300	87,720	48,140	28,980	18,560
6	0	12,480	58,980	83,580	56,520	35,260	26,280
7	0	0	57,180	91,800	56,240	41,260	28,640
8	0	0	0	64,740	56,560	42,700	32,920
9	0	0	0	0	40,780	39,480	33,000
10	0	0	0	0	26,840	32,240	27,920
11	0	0	0	0	14,920	22,840	21,920
12	0	0	0	0	8,280	14,660	14,720
13	0	0	0	0	3,740	7,900	8,920
14	0	0	0	0	2,180	4,080	4,900
15	0	0	0	0	1,260	2,100	2,860
16	0	0	0	0	960	1,200	1,540
17	0	0	0	0	520	680	1,000
18	0	0	0	0	420	520	620
19	0	0	0	0	0	340	380
20	0	0	0	0	0	300	280
21	0	0	0	0	0	160	280
22	0	0	0	0	0	100	60
23	0	0	0	0	0	0	80
24	0	0	0	0	0	0	80
25	0	0	0	0	0	0	0
U	402,780	147,540	61,920	31,580	21,480	16,060	13,540
TOTAL	1,192,840	1,005,500	840,860	574,180	451,580	363,000	292,320
U_i	0.338	0.147	0.074	0.055	0.048	0.044	0.046
Z_i	0.501	0.198	0.071	0.040	0.032	0.030	0.033

The data include high proportions of women with parity not stated at ages 15-19 $(\frac{402, 780}{1, 192, 840} = 0.338)$ , 20-24 (0.147) and, to a lesser extent, the older age groups. The proportion of women reported as childless (Z_i) falls rapidly, from around 50 per cent in the first age group down to around 3 per cent at the end of the childbearing period. On these grounds, it is worth investigating whether an el-Badry correction can be applied to the data.

Step 2: Plot the points (Z_i, U_i) on a set of axes and evaluate the data

The Z_i and U_iare plotted against each other (shown by the blue diamonds) in Figure 1. The straight line fitted to the points is shown by the red line. If a point is excluded from the fitting process, the figure in the spreadsheet represents it with an open diamond.

Figure 1 Fitting of el-Badry correction, Kenya 1989 census

There is a clear linear relationship between the plotted points, and all points can be included in the application of an el-Badry correction.

Step 3: Determine the slope and intercept of the best straight line fit

Performing a linear regression of the Z_i on the U_i for the selected points gives a value for the intercept (beta) of 0.02745. This suggests that around 2.7 per cent of the data on women’s parities can be regarded as truly missing.

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

The revised number of women of zero parity is given by

$N_{i, 0}^{*} = N_{i}^{} (Z_{i} + U_{i} - β)$

while the revised numbers with parity unknown are calculated by multiplying the total number of women in each age group by β as shown in Table 3. For example, the number of women aged 20–24 estimated to be truly of an unknown parity is given by 0.02745× 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15–19 is derived from 1,192,840× (0.501 + 0.338 – 0.027) = 967,594.

Table 3 Revised estimates of numbers of women with parity not stated and childless women by age, Kenya, 1989 Census

	15-19	20-24	25-29	30-34	35-39	40-44	45-49
Revised parity not stated	32,746	27,603	23,084	15,763	12,397	9,965	8,025
Revised zero parity	967,594	318,537	98,236	38,937	23,663	17,135	15,075

For example, the number of women aged 20-24 estimates to be truly of an unknown parity is given by 0.02745 x 1,005,500 = 27,603. The corrected estimate of the number of childless women aged 15-19 is derived from 1,192,840 x (0.501 + 0.338 - 0.027) = 967,594.

Step 5: Calculation of average parities

Since an el-Badry correction has been applied, corrected average parities, presented in Table 4, are then derived using Equation 2.

Table 4 Corrected average parities by age group, Kenya, 1989 Census

	15-19	20-24	25-29	30-34	35-39	40-44	45-49
Average parity	0.242	1.525	3.214	4.760	6.239	7.120	7.510

Note that, relative to the average parities produced if the correction is not applied (and assuming therefore that all women with not stated parity are of parity zero), the correction increases the parities in each age group by a constant, $\frac{1}{1 - β}$ .

Detailed description of the method

The method is fully described in el-Badry (1961). El-Badry’s fundamental insight was that, if it could be assumed that:

1) there is a linear relationship between the proportions of childless women of a given age in a population, and the proportion of women whose parity is not stated; and

2) the true, unknown, proportion of women whose parity is not known is a constant and independent of age, then

$U_{i} = α Z_{i}^{*} + β$ (Equation 3)

where αZ^*_i is the proportion of truly childless women reported as parity not stated, and β is the true, constant, proportion of women with parity not stated.

Hence, if αZ^*_i have been misclassified as not stated when they are truly childless, then

$Z_{i} = Z_{i}^{*} - α Z_{i}^{*} = (1 - α) Z_{i}^{*} .$

and therefore:

$Z_{i}^{*} = \frac{Z_{i}}{(1 - α)}$ (Equation 4)

and substituting this into Equation 3,

$U_{i} = \frac{α}{1 - a} Z_{i}^{} + β = γ Z_{i}^{} + β$

where gamma can be thought of as the odds of a childless woman being classified as being of unknown parity.

Thus, a regression of U_i on Z_i will give estimates of β (as well as γ and α).

From Equation 3, we then obtain

$U_{i} - β = α Z_{i}^{*} = Z_{i}^{*} - Z_{i}^{}$

and hence that

$Z_{i}^{*} = N_{i, 0}^{*} = U_{i} - β + Z_{i}^{}$

and

$U_{i}^{*} = β N_{i}$

Note that, even though we have two identities involving Z_i, they will only give the same answer when the fit is exact. Convention dictates that we prefer to use Equation 3 rather than Equation 4, on the grounds that it relies on the fitted value of β (the estimated proportion of truly not stated parities) rather than on the value of α, which lacks intuitive interpretability.

After deriving corrected values of Z^*_i and U^*_i , average parities can be calculated using Equation 2.

Having applied the correction, care should be taken to ensure that, in every age group, the adjusted number of childless women (that is, of parity zero) is less than the number of women reporting no births in the reference period in response to the question on recent fertility. Hence the revised Z^*_i can be used to determine the minimum number of women who could not have had a birth in the reference period before the census.

A version of the correction designed for (the now-rare) situations where questions on children ever born are asked only of married women is described in Annex II of Manual X (UN Population Division 1983).

References

el-Badry MA. 1961. “Failure of enumerators to make entries of zero: errors in recording childless cases in population censuses”, Journal of the American Statistical Association 56(296):909–924. doi: https://dx.doi.org/10.1080/01621459.1961.10482134

UN Population Division. 1983. Manual X: Indirect Techniques for Demographic Estimation. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/SER.A/81. https://www.un.org/development/desa/pd/sites/www.un.org.development.desa.pd/files/files/documents/2020/Jan/un_1983_manual_x_-_indirect_techniques_for_demographic_estimation.pdf

Author

Moultrie TA

Printer-friendly version
Log in to post comments

Description of the method

Data required and assumptions

Caveats and warnings

Application of method

Step 1: Determine the proportion of women in each age group whose parity is a) unknown; and b) reported as zero

Step 2: Plot the points (Zi, Ui) and evaluate the data

Step 3: Determine the slope and intercept of the best straight line fit to the data

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

Step 5: Calculation of average parities

Interpretation and checks

Worked example

Step 1: Determine the proportion of women in each age group whose parity is a) not stated; and b) equal to zero

Step 2: Plot the points (Zi, Ui) on a set of axes and evaluate the data

Step 3: Determine the slope and intercept of the best straight line fit

Step 4: Estimation of the revised numbers of childless women, and women whose parity is not stated

Step 5: Calculation of average parities

Detailed description of the method

References

Step 2: Plot the points (Z_i, U_i) and evaluate the data

Step 2: Plot the points (Z_i, U_i) on a set of axes and evaluate the data