Estimation of migration from census data

Available Data

Desired Result

Method

Description of the methods

Estimating migration from census data is not technically complicated. Provided that the census(es) gather the appropriate information and are reasonably accurate it is possible to produce estimates of net immigration (i.e. immigration less emigration) of the foreign-born population (people born outside a particular country) and internal migration between (to and from) sub-national regions of a country, over the period between two censuses.

To estimate net immigration of foreigners one essentially subtracts from the number of foreign-born people enumerated in a census, the number of foreigners expected to have survived since being enumerated in the previous census.

In a similar way, if the censuses record the sub-national region of birth one can estimate net in-migration (i.e. net in-migration of those born outside the region less net out-migration of those born in the region) between sub-national regions of a country. However, if the census asks of people where they were living at some prior point in time, say at the time of the previous census, one is able to estimate directly the number of surviving migrants (i.e. migrants still alive at the time of the latest census) into and out of each sub-national region of the country since that prior point in time.

In order to estimate the number of migrants from the number of surviving migrants at the time of the second census one needs to add to these figures an estimate of the number of migrants who are expected to have died between moving and the time of the latest census.

If the latest census records other information such as year in which the migrant moved to the place at which the person was counted in the census, it is possible also to establish a trend of migration over time.

Migration is different from fertility and mortality both in that migrating is not final in the sense of a birth or death, but also that we are concerned not only with the population of origin, from which the migrant moved (which corresponds to a population exposed to the risk from which rates of migration akin to those of fertility and mortality can be calculated) but we also have a population to which the migrant moves, the destination population. Apart from this, in order to understand migration one is often interested in distinguishing between different types of migration (whether temporary or more permanent, whether circulatory or unidirectional, etc.). For these reasons there is a much wider range of measures and terminology associated with migration than there is with either fertility or mortality. It is not the purpose of this chapter to cover these issues and the interested reader is referred to the standard texts on the subject such as the UN Manual VI (UN Population Division 1970), Shryock and Siegel (1976), Siegel and Swanson (2004).

Data requirements and assumptions

Tabulations of data required

To estimate net immigration of foreigners:
- the number of foreign-born females (males), in five-year age groups, and for an open age interval A+, at two points in time, typically two censuses
- For the deaths: either a suitable model life table or the numbers of native-born females (males), in five-year age groups, and for an open age interval A+, at two points in time, typically two censuses. Failing these, the central crude death rate for the population
To estimate sub-national regional net in-migration from place of birth data:
- the number of females (males) by sub-national region and by sub-national region of birth, in five-year age groups, and for an open age interval A+, at two points in time, typically two censuses
- For the deaths: either a suitable model life table, the numbers of native-born females (males), in five-year age groups, and for an open age interval A+, at two points in time, typically two censuses or numbers of deaths by region from the vital registration. Failing these, the central crude death rate for the population
To estimate internal migration between sub-national regions from place of residence at previous census data:
- The numbers of females (males) by sub-national region and by sub-national region at some prior date, typically that of the preceding census, in five-year age groups, and for an open age interval A+.

If age-specific numbers are not available, aggregated data is still useful for estimating all-age migration.

Important assumptions

Estimating net immigration of foreigners:
- Censuses identify all foreign-born people accurately
- One is able to estimate the mortality of the foreign-born population accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the native-born (locally-born) national population)
- No return migration of locally born emigrants

Estimating sub-national regional net in-migration from place of birth data:
- Censuses count the population by sub-national region accurately and identify the region of birth accurately
- One is able to estimate the mortality of people moving between two regions accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the native-born national population).

Estimating internal migration between sub-national regions from data on place of residence at previous census:
- Latest census identifies correctly all people who have moved from one region to another since the prior date (e.g. previous census)
- One is able to estimate the mortality of people moving between two regions accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the native-born national population). Since one is estimating in- and out-migration separately (as opposed to net migration) this assumption is of less importance.

Preparatory work and preliminary investigations

Before applying this method, you should investigate the quality of the data in at least the following dimensions

age structure of the population (by sub-national region as appropriate); and
relative completeness of the census counts (by sub-national region as appropriate).

Caveats and warnings

Estimating migration using place of birth data from two censuses not only requires that the censuses count the population reasonably completely, but that the place of birth be accurately recorded. Often this is not the case, particularly when estimating immigration, where immigrants wish to hide the fact that they are foreign, but also in the case of internal migration where there may have been boundary changes or the respondent is ignorant about the place of birth of the person.

Estimating migration by asking questions of migrants is quite dependent on the census identifying completely all those who have migrated, as well as identifying the place from which moved correctly. To the extent that recent migrants are not yet established as residents of the region to which they have moved at the time of the census, they could be missed in the count.

Net migration, by definition, underestimates the flows of migrants into and out of a region or country. Thus, for example, people who moved into a region and then returned within the period being considered will result in zero net in-migration and yet moved twice.

Application of the method

A: Estimating net immigration of foreigners using place of birth data

This method produces estimates of the net immigration of foreigners using place of birth data. It is important to stress that this method does not take into account or measure the immigration of returning native-born people who left the country prior to the previous census and returned before the second census. Thus this method is not recommended for the measurement of immigration where significant return migration of native-born people (for example, after exile or forced migration of refugees) is in progress.

Step 1: Decide on survival factors

If data on the number of foreign-born people in the population are available by age group for each census then one needs to estimate the survival factors to be applied to the numbers of foreign-born in the first census to estimate the numbers surviving to the time of the second census. The user can choose between years of life lived in five-yearly age groups (₅L_x) based on the standard from the General family of United Nations model life tables or one of any of the four families of Princeton model life tables or a model life table of a population experiencing an AIDS epidemic (Timæus 2007) or failing this, the survival factors can be derived from the proportion of each five-year age group of the native-born population surviving from the first to the second census (assumed to be n years apart, where n is a multiple of 5). Thus $_{5} S_{x, n},_{\infty} S_{A - n, n} $ and $ S_{B, n} $ , the n-year survival factor for a group of people aged x to x + 5 at the previous census, A-n and older at the previous census, and born between censuses, respectively are estimated as follows:

$\begin{array}{l} _{5} S_{x, n} = \frac{_{5} L_{x + n}}{_{5} L_{x}} or \frac{_{5} N_{x + n}^{n b} (t + n)}{_{5} N_{x}^{n b} (t)}, \\ _{\infty} S_{A - n, n} = \frac{T_{A}}{T_{A - n}} or \frac{_{\infty} N_{A}^{n b} (t + n)}{_{\infty} N_{A - n}^{n b} (t)}, and \\ S_{B, n} = \frac{_{n} L_{0}}{n l_{0}} or \frac{_{n} N_{0}^{n b} (t + n)}{ B^{n b}} . \end{array}$

where the superscript nb represents ‘native-born’, $_{5} N_{x}^{n b} (t)$ represents the native-born population in the census at time t and B^nb represents the number of native-born births between time t and t + n.

If the data are not available in five-year age groups, the net number of immigrants can still be estimated in total provided we have an estimate of the crude death rate for the population (which might, in the absence of any evidence to the contrary, be assumed to be that of the native-born population).

Step 2: Estimate the number of deaths of the immigrants

If data on the number of foreign-born people in the population are available by age group for two censuses (n years apart) then one needs to estimate the number of deaths of foreign-born people (denoted by the superscript F) aged between x and x+5 at the first census (at time t), $_{5} D_{x}^{F} $ , aged A-n and older at the first census, $_{\infty} D_{A - n}^{F} $ , and those born between the censuses, $ D_{B}^{F} $ , as follows:

$\begin{array}{l} _{5} D_{x}^{F} = \frac{1}{2} (_{5} N_{x}^{F} (t) \cdot _{5} S_{x, n} + _{5} N_{x + n}^{F} (t + n)) (\frac{1}{_{5} S_{x, n}} - 1), \\ _{\infty} D_{A - n}^{F} = \frac{1}{2} (_{\infty} N_{A - n}^{F} (t) \cdot _{\infty} S_{A - n, n} + _{\infty} N_{A}^{F} (t + n)) (\frac{1}{_{\infty} S_{A - n, n}} - 1), \\ and D_{B}^{F} = \frac{1}{2} (_{n} N_{0}^{F} (t + n)) (\frac{1}{ S_{B, n}} - 1) . \end{array}$

where $_{5} N_{x}^{F} (t)$ represents the number of foreign-born people according to the census at time t who were aged between x and x+5.

If data and/or survival factors are not available by age group then one can estimate the total number of deaths of the foreign-born people as follows:

However, if the age distribution of the foreign-born population is markedly different from that of the population in the country of the census, then this can produce a poor approximation to the true number of deaths.

Step 3: Estimate the net number of immigrants (of foreigners)

If data are available by age group for each census, then age-specific net immigration can be estimated as follows:

$Net _{5} M_{x}^{F} = _{5} N_{x + n}^{F} (t + n) - _{\infty} N_{x}^{F} (t) + _{5} D_{x}^{F}$

for x = 0, 5, … , A-5-n where $Net _{5} M_{x}^{F}$ represents the net number of immigrants between times t and t+n who were aged between x and x + 5 at time t. For x > A - 5 - n

$Net _{\infty} M_{A - n}^{F} = _{\infty} N_{A}^{F} (t + n) - _{\infty} N_{A - n}^{F} (t) + _{\infty} D_{A - n}^{F} .$

The net number of immigrants of those born between times t and t+n is estimated as follows:

$Net M_{B}^{F} = _{n} N_{0}^{F} (t + n) + D_{B}^{F} .$

If data and/or survival factors are not available by age group then one would estimate of the total net number of immigrants as follows:

$Net _{\infty} M_{0}^{F} = _{\infty} N_{0}^{F} (t + n) - _{\infty} N_{0}^{F} (t) + _{\infty} D_{0}^{F} .$

B: Estimating net internal migration between sub-national regions from place of birth data

Net in-migration into a particular sub-national region from other regions in the country can be estimated in exactly the same way as the international immigration, described above, by replacing the foreign-born population with the population born outside the region.

In addition, applying the same method to data on the change in the numbers of population born in (rather than outside) and living outside the region of interest allows us to estimate the net out-migration of those born in the region to other regions in the country. Subtracting this from the net in-migration of those born outside the region gives an estimate of the overall net in-migration into the region of interest.

If there is reason to suspect that there is a material difference in the mortality experienced by those born outside who moved into the region and those born in the region who moved out, and one has appropriate survival factors then one could apply different survival factors to each when estimating the net number of migrants. However, in practice it is likely that inaccuracies in the census data on place of residence at previous census are likely to outweigh any increase in accuracy achieved by using differential mortality.

C: Estimating internal migration between sub-national regions from place of residence at previous survey

Net sub-national inter-regional migration is estimated directly from the numbers of people in each region at the time of the census who moved since the previous census by place (e.g. region) they were in at a given prior date (e.g. at the time of the previous census). Confining the estimates to inter-regional flows the sum of the numbers of inter-regional in-migrants should be equal to the sum of inter-regional out-migrants; however, if the data include immigration to the sub-national regions from outside the country one can extend the estimates of in-migration to include international immigration into each region.

Since one of the major areas of interest is the magnitude of inter-regional flows of the population, one is as interested in the total numbers of migrants between regions as one is in the age distributions of particular flows.

The number of migrants is derived from the number of surviving in- and out-migrants as follows:

$_{5} M_{x} = (_{5} {I^{'}}_{x} - _{5} {O^{'}}_{x} + {(_{5} {I^{'}}_{x} - _{5} {O^{'}}_{x})}_{x} / _{5} S_{x}) / 2,$

where the superscript (’) represents numbers surviving and ₅I’_x and ₅O’_x respectively represent the number of surviving in-migrants into, and the surviving number out-migrants from, a particular region at the time of the second census who were aged between x and x+5 at the second census.

Worked example

This example uses data on the numbers of males in the population from the South African Census in 2001 and a ‘census replacement survey’, the Community Survey in 2007. (Although the survey was conducted approximately 5.35 years after the night of the census in 2001, it is assumed for the purposes of presentation here to have been exactly five years after the census in 2001.)

A: Estimating net immigration of foreigners using place of birth

Step 1: Decide on survival factors

The survival factors are shown in the fifth column of Table 1. The values are derived from (the years of life lived in each age group of) an alternative life table for those aged 20 to 24 last birthday and those aged 80 and over at the time of the first census, and those born between the two censuses, as follows:

$\begin{array}{l} _{5} S_{20, 5} = \frac{_{5} L_{25}}{_{5} L_{20}} = \frac{4 .3382}{4 .4975} = 0.96458 \\ _{\infty} S_{80, 5} = \frac{T_{85}}{T_{80}} = \frac{0 .75180}{1 .19603} = 0.40912 \\ and S_{B, 5} = \frac{_{5} L_{0}}{5 l_{0}} = \frac{4 .707549}{5} = 0.94151. \end{array}$

Table 1 Estimation of deaths of foreign-born and the net number of immigrants by age group, South Africa, 2001-2006

Age	2001	2006	x	₅S_x	Age at 2^nd census	D^F	Net M
			B	0.94151
0- 4	8,963	12,577	0	0.97896	0- 4	391	12,968
5- 9	10,390	13,724	5	0.99547	5- 9	242	5,003
10-14	13,508	13,998	10	0.99427	10-14	55	3,664
15-19	27,835	27,943	15	0.98602	15-19	119	14,555
20-24	69,787	59,493	20	0.96458	20-24	616	32,275
25-29	87,381	95,763	25	0.93161	25-29	2,994	28,970
30-34	73,338	100,450	30	0.90960	30-34	6,675	19,743
35-39	66,663	85,490	35	0.89780	35-39	7,563	19,715
40-44	59,152	75,684	40	0.89092	40-44	7,701	16,721
45-49	45,184	66,113	45	0.88633	45-49	7,274	14,234
50-54	40,398	55,913	50	0.87224	50-54	6,154	16,883
55-59	30,640	42,833	55	0.84731	55-59	5,717	8,153
60-64	24,376	34,433	60	0.80885	60-64	5,442	9,234
65-69	17,895	25,588	65	0.75468	65-69	5,353	6,564
70-74	13,561	18,989	70	0.66991	70-74	5,281	6,375
75-79	10,238	12,850	75	0.56388	75-79	5,404	4,693
80-84	7,658	7,461	80+	0.40912	80-84	5,118	2,341
85+	4,455	5,305			85+	7,410	602
Total	611,423	754,608			Total	79,509	222,693

Step 2: Estimate the number of deaths

Since we have data on the number of foreign-born people in the population by age group for each census we can estimate the number of deaths of foreign-born people which occurred in the period between the two censuses by age group using the numbers of foreigners in each census given in the second and third columns of Table 1. For those aged 20 to 24 last birthday and those aged 80 and over at the time of the first census, and those born between the two censuses, the calculations are as follows:

$\begin{array}{l} _{5} D_{20}^{F} = \frac{1}{2} (_{5} N_{20}^{F} (2001) \cdot _{5} S_{20, 5} + _{5} N_{25}^{F} (2006)) (\frac{1}{_{5} S_{20, 5}} - 1) \\ = (69787 \cdot 0.96458 + 95763) (\frac{1}{0.96458} - 1) = 2994 \\ _{\infty} D_{80}^{F} = \frac{1}{2} (_{\infty} N_{80}^{F} (2001) \cdot _{\infty} S_{80, 5} + _{\infty} N_{85}^{F} (2006)) (\frac{1}{_{\infty} S_{80, 5}} - 1) \\ = ((7658 + 4455) 0.40912 + 5305) (\frac{1}{0.40912} - 1) = 7410 \\ and D_{B}^{F} = \frac{1}{2} (_{5} N_{0}^{F} (2006)) (\frac{1}{ S_{B, 5}} - 1) = 12577 (\frac{1}{0.94151} - 1) = 391 . \end{array}$

If data and/or survival factors were not available by age group then one could estimate the total number of deaths of the foreign born people as follows, given an estimate of the crude mortality rate in the population of 14 per 1,000:

$_{\infty} D_{0}^{F} = \frac{5}{2} (_{\infty} N_{0}^{F} (2001) + _{\infty} N_{0}^{F} (2006))_{\infty} m_{0} = \frac{5}{2} (611423 + 754608) \frac{14}{1000} = 47811 .$

Step 3: Estimate the net number of immigrants (of foreigners)

Since data are available by age group for each census, age-specific net immigration of those born outside the country can be estimated as follows:If data and/or survival factors were not available by age group then one could estimate the total net number of immigrants as follows:

$\begin{array}{l} Net _{5} M_{20}^{F} = _{5} N_{25}^{F} (2006) - _{\infty} N_{20}^{F} (2001) + _{5} D_{20}^{F} = 95763 - 69787 + 2994 = 28970 \\ Net _{\infty} M_{80}^{F} = _{\infty} N_{85}^{F} (2006) - _{\infty} N_{80}^{F} (2001) + _{\infty} D_{80}^{F} = 5305 - (7658 + 4455) + 7410 = 602 \\ Net M_{B}^{F} = _{5} N_{0}^{F} (2006) + D_{B}^{F} = 12577 + 391 = 12968 . \end{array}$

If data and/or survival factors were not available by age group then one could estimate the total net number of immigrants as follows:

$Net _{\infty} M_{0}^{F} = _{\infty} N_{0}^{F} (2006) - _{\infty} N_{0}^{F} (2001) + _{\infty} D_{0}^{F} = 754608 - 611423 + 47811 = 190996$

B: Estimating sub-national regional net in-migration using place of birth

The second and third column of Table 2 show the numbers of people living in the Western Cape province of South Africa who were born outside the province, as counted by the 2001 Census and the 2007 Community Survey, respectively. Although the same survival factors (column 5) have been used as were used in the example of Method A, this should not be the case if it was thought that the mortality experience of native-born and immigrants were very different. The final column of Table 2 gives the net numbers of migrants into the Western Cape who were born in provinces other than the Western Cape for the different age groups. Thus in total 213,911 people born outside the Western Cape moved to the Western Cape (after excluding those who moved out).

Table 2 Estimation of the net number of in-migrants of those born outside by age group, Western Cape, South Africa, 2001-2006

Age	2001	2006	x	₅S_x	Age at 2^nd census	D_O	Net M (born out)
			B	0.94151
0- 4	16,443	19,012	0	0.97896	0- 4	591	19,602
5- 9	24,406	28,743	5	0.99547	5- 9	482	12,782
10-14	31,134	30,792	10	0.99427	10-14	125	6,511
15-19	44,478	53,933	15	0.98602	15-19	245	23,043
20-24	74,011	82,526	20	0.96458	20-24	896	38,944
25-29	80,187	89,522	25	0.93161	25-29	2,954	18,466
30-34	65,833	90,783	30	0.90960	30-34	6,074	16,670
35-39	56,393	76,475	35	0.89780	35-39	6,776	17,417
40-44	44,420	59,692	40	0.89092	40-44	6,268	9,567
45-49	32,862	47,612	45	0.88633	45-49	5,338	8,529
50-54	28,178	37,969	50	0.87224	50-54	4,303	9,409
55-59	19,983	30,205	55	0.84731	55-59	4,012	6,039
60-64	17,569	25,593	60	0.80885	60-64	3,832	9,442
65-69	11,216	20,802	65	0.75468	65-69	4,137	7,371
70-74	8,365	12,612	70	0.66991	70-74	3,426	4,822
75-79	5,919	8,434	75	0.56388	75-79	3,458	3,528
80-84	4,063	5,061	80+	0.40912	80-84	3,248	2,390
85+	2,152	2,183			85+	3,413	-620
Total	567,613	721,949			Total	59,576	213,911

The second and third columns of Table 3 present the numbers of people living in provinces other than the Western Cape who were born in the Western Cape, as counted by the 2001 census and the 2007 Community Survey, respectively. The net number of out-migrants of those born in the Western Cape (i.e. the number of people born in the Western Cape who moved out, less those who have returned) is given in column 8. The negative numbers mean that there was negative net out-migration (i.e. the number of those born in the Western Cape who moved to other provinces in the period was less than the number born in the Western Cape who were living outside who returned during the period). Thus the total of -19,017 means that the number of people born in the Western Cape, who returned to the Western Cape during the period having lived in another province until 2001 exceed those who were born in the Western Cape and moved to another province in the period by 19,017.

These estimates were derived using the same survival factors as were used for those born outside the Western Cape who moved into the province, but if there was reason to suppose that the mortality differed for those born in the Western Cape who moved out, then a different set of survival factors would be used to estimate the Net M (born in) numbers.

The overall net in-migration for the province is thus given in the final column of Table 3. Thus in total 232,928 more people moved into the Western Cape than left the Western Cape to live in another province.

In this example those born outside the province include those born outside the country and thus the overall net migration includes immigrants who settle in the province. Excluding the foreign-born from Table 2 would produce numbers of internal in-migrants net of internal out-migrants, and the sum of these numbers for all the provinces together would be zero.

Table 3 Estimation of the net number of out-migrants of those born inside by age group, Western Cape, South Africa, 2001-2006

Age	2001	2006	x	₅S_x	Age at 2^nd census	D_I	Net M (born in)	Overall Net M
			B	0.94151
0- 4	22,055	11,747	0	0.97896	0- 4	365	12,112	7,490
5- 9	21,895	12,509	5	0.99547	5- 9	367	-9,180	21,962
10-14	21,382	11,593	10	0.99427	10-14	76	-10,226	16,737
15-19	18,265	13,455	15	0.98602	15-19	100	-7,827	30,870
20-24	14,645	10,477	20	0.96458	20-24	202	-7,587	46,531
25-29	13,501	9,534	25	0.93161	25-29	434	-4,676	23,142
30-34	13,118	11,047	30	0.90960	30-34	867	-1,587	18,257
35-39	12,121	14,614	35	0.89780	35-39	1,319	2,815	14,602
40-44	11,725	12,195	40	0.89092	40-44	1,311	1,384	8,183
45-49	10,335	10,538	45	0.88633	45-49	1,285	98	8,431
50-54	9,211	9,881	50	0.87224	50-54	1,221	768	8,642
55-59	7,264	10,568	55	0.84731	55-59	1,362	2,720	3,319
60-64	6,691	7,723	60	0.80885	60-64	1,250	1,710	7,732
65-69	4,643	5,297	65	0.75468	65-69	1,265	-128	7,499
70-74	3,954	3,766	70	0.66991	70-74	1,182	304	4,517
75-79	2,331	2,384	75	0.56388	75-79	1,240	-330	3,858
80-84	1,402	2,140	80+	0.40912	80-84	1,336	1,145	1,244
85+	707	555			85+	1,024	-531	-89
Total	195,246	160,023			Total	16,206	-19,017	232,928

C: Estimating internal migration between sub-national regions from data on place of residence at previous census

Table 4 presents the results of the answers to the question about place (province in this example) of residence at the time of the 2001 Census given by those counted in each of the provinces in the 2007 Community Survey. (In actual fact the question asked whether the person was staying at the same place at the time of the prior census and if not, where they were staying at the time they moved to the place at which they were counted in the Community Survey. However, work by Dorrington and Moultrie (2009) shows that using these data and the year of movement to back project the population in order to estimate the numbers by province of residence at the time of the previous survey suggests that the assumption that there was only one move in the five years since the previous census was reasonably accurate.)

By far the largest numbers of migrants are those that moved within each of the provinces, however, these have been excluded from Table 4 because one is usually more interested in interprovincial migration than migration within a province.

Table 4 Interprovincial migration, South Africa, 2001-2006

	Province where counted (destination)
Previous residence (origin)	WC	EC	NC	FS	KZ	NW	GT	MP	LM	Total
WC		12,173	4,060	1,745	3,221	2,113	16,400	1,405	874	41,992
EC	52,239		1,120	7,187	25,209	14,430	28,633	4,693	2,116	135,626
NC	4,813	1,942		3,480	908	3,728	4,956	1,062	357	21,246
FS	2,943	3,145	2,546		2,352	12,733	19,920	4,293	1,963	49,896
KZ	6,762	7,015	631	2,358		3,573	50,980	8,886	1,194	81,399
NW	1,478	907	9,811	5,555	2,329		47,633	3,090	4,337	75,140
GT	24,891	12,948	3,962	11,437	18,145	32,433		18,598	15,133	137,547
MP	2,134	1,317	280	1,724	4,546	5,767	42,941		8,628	67,338
LM	2,754	1,583	255	1,709	2,209	9,773	81,394	24,211		123,889
OSA	21,221	5,467	1,209	9,584	10,933	11,437	51,873	8,335	9,286	129,346
DNK	500	3	15	124	132	78	228	89	0	1,170
UNS	1,058	1,029	107	208	875	508	3,558	408	633	8,384
Total	120,794	47,528	23,996	45,111	70,860	96,573	348,516	75,070	44,524	872,973
WC = Western Cape, EC = Eastern Cape, NC = Northern Cape, FS = Free State, KZN = KwaZulu-Natal, NW = North West, GT = Gauteng, MP = Mpumalanga, LM = Limpopo, OSA = Outside SA, DNT = Do not know, UNS = Unspecified

In addition to the all-age numbers in Table 4 (in actual fact these numbers exclude, as is often the case, migration of those born between the census and survey) one can also produce numbers of in- and out-migration by age groups as shown in Table 5. For completeness these numbers include estimates of the number of migrants who were born since the previous census. However, relative to the other migrants these numbers look implausibly high, and the reason for this is discussed below.

The net number of migrants is estimated for those aged 25-29 at the time of the Community Survey (i.e. were aged 20-24 at the time of the 2001 census), for example, as follows:

$_{5} M_{x} = ( 20675 - 5649 + ( 20675 - 5649) / 0.96458) / 2 = 15301 .$

Table 5 Estimation of the net number of in-migrants by age group, Western Cape, South Africa, 2001-2006

Age	Surviving in- migrants (I’)	Surviving out- migrants (O’)	x	₅S_x	Net in-migrants

0- 4	20,846	11,747	B	0.94151	9,381
5- 9	6586	3,554	0	0.97896	3,065
10-14	6685	2,882	5	0.99547	3,812
15-19	10402	3,967	10	0.99427	6,454
20-24	21266	4,488	15	0.98602	16,897
25-29	20675	5,649	20	0.96458	15,301
30-34	15584	6,008	25	0.93161	9,928
35-39	10584	5,098	30	0.90960	5,758
40-44	7264	3,045	35	0.89780	4,458
45-49	4648	2,714	40	0.89092	2,053
50-54	3095	1,500	45	0.88633	1,698
55-59	3940	935	50	0.87224	3,225
60-64	3776	527	55	0.84731	3,541
65-69	3127	818	60	0.80885	2,582
70-74	1540	437	65	0.75468	1,282
75-79	561	206	70	0.66991	442
80-84	797	116	75	0.56388	944
85+	264	47	80+	0.40912	374
Total	141,640	53,739			91,194

Diagnostics, analysis and interpretation

Checks and validation

Perhaps the simplest check, on the reasonableness of the ‘shape’ (i.e. distribution of the numbers by age) of the estimates but not the level, is to see if it conforms to the standard shape (or a variation thereof). Rogers and Castro (1981a; 1981b) point out that the distribution of the number (or rate) of in- and out-migrants tends to conform to standard patterns, with a peak in the young adult ages (usually associated with seeking employment), a second, usually less pronounced peak amongst very young children falling to a trough amongst young teenagers (the size depending on the extent to which it is families rather than individuals moving in the young to middle aged adults). Sometimes there is also a ‘hump’ (or trough) around retirement age if there is a strong flow of migrants moving to (or away from) the place to retire.

These patterns (not necessarily the same pattern) apply to in- and out-migration flows separately, but not necessarily to net migration (which is the difference between the two flows) unless one flow (either the in-migration or the out-migration) is much greater than the other.

Figure 1 illustrates this using some of the estimates calculated above, expressed as proportions of the total number in each case (to allow them to be presented on a single figure). From this we can see that in broad terms (with the exception in some cases, where the proportion of migrants at the very young ages looks implausibly high) each conforms to the expected shape.

The net out-migrants of those born in the Western Cape (excluded from the figure for ease of illustration) does not conform to a standard model of migration, which could indicate these numbers are not very reliable, however, they are small relative to the in-migration of those born outside the province, and thus such a deviation may tolerated. In addition to this there are two other features to be noted from Figure 1. The first is that the out-migration from the Western Cape as estimated from data on place of residence at previous census, suggests that adult out-migrants peak at a somewhat older age (and possibly are likely to represent family rather than individual migration). The second is the fact that the net immigration into the country follows the standard shape which indicates that the flow into the country is much stronger than the return flow of those migrants.

Figure 1 Age distribution of selected migrant flows, South African males, 2001-2006

If the census asked place of birth and place of residence at the previous census then one can compare the two estimates of net in-migration into a specific sub-national region. If they are similar this gives one some confidence in the results. In the case of the place of birth data for South Africa the net number of in-migrants into the Western Cape is 232,928 (Table 3) while the estimate from the data on place of residence at the time of the previous census data produced an estimate of 92,194 (Table 4), which suggests that one or both of these sets of data are suspect.

The most basic check of the estimates of migration is to project the population (of the country or the province) at the first census to the time of the second census making use of the estimates of the number of migrants and compare that with the census estimates from the second, more recent, census to see how well the two match, especially in the age range in which migration is concentrated. In the case of the net in-migration into the Western Cape, projecting the population forward from 2001 using the estimates derived from the change in the numbers by place of birth produced a much closer fit to the population in the 20-29 year age range, suggesting that the data on place of birth are probably more complete than those on the place of residence at the date of the previous census. To some extent this is supported by a comparison of the change in the number of foreign-born in the country between the two censuses, 222,693 (Table 1) with the sum of the numbers who reported that they had moved from outside South Africa to one of the provinces since the previous census, 129,346 (Table 4).

Ideally, if one had independent estimates of the number of migrants one might compare those numbers against estimates using the above methods. Unfortunately, reliable independent estimates are rare. Although most countries try to record people entering and leaving the country, these data are often not reliable, particularly in developing countries with relative porous borders. And unless the country is extremely well regulated and maintains a complete and accurate register of the population, the only other way to measure internal migration is through migration-specific surveys, which tend to be much more useful for understanding the type of migration (whether permanent, temporary, cyclical, etc.) than for producing reliable estimates of the number of migrants, given the often less structured situation that (particularly recent) migrants find themselves living in and an understandable reluctance to identify themselves as being migrants.

Interpretation

Considering the numbers of migrants estimated from the data on place of residence at the previous census given in Table 4 (and taking into account the suspicion that these probably underestimate the true migration), some 2-4% of the population changed province of residence in the 5 years between the 2001 Census and the Community Survey. Had we included the number who moved within, but did not change, province then between 7 and 15 per cent of the population moved in the 5‑year period.

The main provinces of destination are Gauteng (by a big margin) and Western Cape, which are predominantly urban and the wealthiest provinces. The main provinces of origin are Gauteng (inspection of the age distribution would show that this is mainly return migration of ‘retiring’ workers) Eastern Cape and Limpopo, which are poor, mainly rural provinces, from which people seeking work migrate to the urban areas.

It appears that migration is predominantly of individuals (seeking work) rather than of families.

Method-specific issues with interpretation

Scanning errors

A particular feature of the data relying on province of birth is the apparently relatively high number of children born since the first census who have moved to another province. In all likelihood this is an artefact of the data capturing process. Scanning was used to capture the data from the questionnaires on which Western Cape was coded as a “1”, written in the appropriate space by hand. It appears that in a small percentage of cases the scanner might have had trouble distinguishing a handwritten “1” from a handwritten “7” (the code for Gauteng). The result of this is, for example, that some of the children coded as having been born outside the province in which they were counted, and thus appear to be migrants, but probably were not. Even though the percentage error in scanning is very small, the number of births can be large relative to the number migrants, and thus the error can produce noticeable errors. Since an increasing number of developing countries are using scanning to capture data, this sort of problem may be quite common.

Where scanning errors or other situations make it impossible to produce reliable estimates of the number of migrants of those born since the previous census one can use CWR from second census as follows:

${Net}_{5} M_{0} = \frac{1}{4} C W R_{0} \cdot Net _{30} M_{15}^{f}$

for those born in the most recent five years, and ${Net}_{5} M_{5} = \frac{3}{4} C W R_{5} \cdot Net _{30} M_{20}^{f}$ for those born in the five years before that if the censuses are 10 years apart, where CWR_x represents ratio of the number of children aged between x and x+5 to the number of women in the population aged between 15+x and 45+x in the population (regional or national) at the time of the second census, and $_{30} M_{x}^{f} $ represents the number of women migrants aged between x and x+30.

Applying this to the data for the Western Cape suggest that the number of migrants born since the previous census should be less than half the numbers being estimated from the data on place of birth.

Detailed description of method

Mathematical exposition

The indirect estimation of migration derives from the balance equation for two censuses n years apart, namely:

$\begin{array}{l} _{5} N_{x + n} (t + n) = _{5} N_{x} (t) - _{5} D_{x} + _{5} {I^{'}}_{x} - _{5} {O^{'}}_{x} \\ = _{5} N_{x}^{} (t) - _{5} D_{x}^{} + _{5} {M^{'}}_{x} \end{array}$

where $_{5} {M^{'}}_{x} = _{5} {I^{'}}_{x} - _{5} {O^{'}}_{x}$ is the net (i.e. in less out) number of in-migrants, aged x to x+5 at the time of the first census, surviving to the second census, and ₅D_x,₅I’_x and₅O’_x, represent the number of deaths, surviving in-migrants and out-migrants, aged x to x+5 at the time of the first census, who died or moved in the period between the censuses.

For those born after the first census the equation becomes:

$_{n} N_{0}^{} (t + n) = B - D_{B}^{} + {M^{'}}_{B}$

and those in the open age interval:

$_{\infty} N_{A}^{} (t + n) = _{\infty} N_{A - n}^{} (t) - _{\infty} D_{A - n}^{} + _{\infty} {M^{'}}_{A - n}$

where B represents the number of births in the population between the two censuses, D_B the number of deaths of those births in the period between the censuses and M’_B the net number of surviving migrants, born outside the country in the period between the two censuses, _∞D_A-n the number of deaths in the intercensal period aged A-n and older at the time of the first census, and _∞M’_A-n the net number of migrants aged A-n and older at the time of the first census.

Thus

$\begin{array}{l} _{5} {M^{'}}_{x} = _{5} N_{x + n}^{} (t + n) - _{5} N_{x}^{} (t) + _{5} D_{x}^{} \\ {M^{'}}_{B} = _{n} N_{0}^{} (t + n) - B + D_{B}^{} \\ _{\infty} {M^{'}}_{A - n} = _{\infty} N_{A}^{} (t + n) - _{\infty} N_{A - n}^{} (t) + _{\infty} D_{A - n}^{} \end{array}$

or alternatively

$\begin{array}{l} _{5} {M^{'}}_{x} = _{5} N_{x + n}^{} (t + n) - _{5} N_{x}^{} (t) _{5} S_{x} \\ {M^{'}}_{B} = _{n} N_{0}^{} (t + n) - B S_{B} \\ _{\infty} {M^{'}}_{A - n} = _{\infty} N_{A}^{} (t + n) - _{\infty} N_{A - n}^{} (t) _{\infty} S_{A - n} \end{array}$

where ₅S_x , S_B and _∞S_A-n represent the proportion of the populations aged x to x+5 at the time of the first census, born between the censuses, and aged A-n and older at the time of the first census, respectively, surviving to the second census.

The net number of migrants can thus be estimated from the net number surviving to the second census as follows:

$\begin{array}{l} _{5} M_{x} = (_{5} {M^{'}}_{x} + _{5} {M^{'}}_{x} / _{5} S_{x}) / 2 =_{5} {M^{'}}_{x} \frac{(_{5} S_{x} + 1)}{2 _{5} S_{x}} \\ M_{B} = {M^{'}}_{B} \frac{( S_{B} + 1)}{2 S_{B}} \\ _{\infty} M_{A - n} =_{\infty} {M^{'}}_{A - n} \frac{(_{\infty} S_{A - n} + 1)}{2 _{\infty} S_{A - n}} . \end{array}$

Unfortunately, since the net number of migrants is usually small relative to the size of the population, age misstatement or errors in either or both census counts can lead to very poor estimates being produced. Better estimates of the net number of immigrants into a country can be produced by confining one’s attention to the population of foreigners (defined as those born outside the country) and assuming that return migration of emigrants from the country of interest is insignificant. Thus one replaces each of the symbols above by equivalents specific to the foreign-born population in the country. Since it is unlikely that one has an accurate record of the number of the foreign-born deaths these need to be estimated in one of the following ways:

Option 1 (Life table survival ratios): Applying rates from a suitable model life table, then

$_{5} S_{x} = \frac{_{5} L_{x + n}}{_{5} L_{x}}, S_{B} = \frac{_{n} L_{0}}{n \cdot l_{0}} {and}_{\infty} S_{A - n} = \frac{T_{A}}{T_{A - n}} .$

Option 2 (Census survival ratios): Assuming that emigration of the native-born population is insignificant and that the proportions surviving are the same as those in the native-born population, then

$_{5} S_{x} = \frac{_{5} N_{x + n}^{n b} (t + n)}{_{5} N_{x}^{n b} (t)}, S_{B} = \frac{_{n} N_{0}^{n b}}{B^{n b}} {and}_{\infty} S_{A - n} = \frac{_{\infty} N_{A}^{n b} (t + n)}{_{\infty} N_{A - n}^{n b} (t)},$
where the superscript “nb” designates native-born.

Option 3 (Vital registration): Where one has access to numbers of births and deaths from another source such as vital registration (which is only likely to be the case, if at all, with internal migration), one could work with deaths and births corresponding to the migrant population directly instead of survival ratios to estimate the net number of surviving in-migrants. Alternatively the net number of migrants can be derived as above by setting

$_{5} S_{x} = 1 - \frac{_{5} D_{x}}{_{5} N_{x} (t)}, S_{B} = \frac{D_{B}}{B} {and}_{\infty} S_{A - n} = \frac{D_{A - n}}{_{\infty} N_{A - n} (t)}$

where the births and deaths are from the vital registration.

However, for most developing countries, particularly those in Africa, vital registration systems are too incomplete to be used in this way.

Internal migration

When it comes to internal migration one can estimate net in-migration (i.e. in-migration of those born outside the region less out-migration of those born outside the region who had previously moved into the region) into each sub-national region of those born outside the region by making use of place of birth information to identify the change in numbers of those born outside the region, in the same way as described above. However, since one also has the place of residence of those born in the region who have moved out of the region since birth (but not emigrated) one can also estimate the net out-migration of those born in the region (i.e. out-migration of those born in the region less those born in the region who have returned after having previously moved out of the region) by applying the method described above to the population born in the region (as opposed to those born outside the region).

When estimating the survival of those born in the various regions the census survival ratios could have an advantage over the life table survival ratios in that any under or over count of the population by region, may well be matched by a similar distortion in the national population and hence in the survival ratios, thus resulting in a more accurate estimate of the number of migrants than would be produced by using life table survival ratios.

Apart from place of birth a census can ask of those who moved since the previous census (or some other suitable date) where they were at that census (or some other suitable date) which allows one to measure out-migration and hence (gross) in-migration separately for each sub-national region.

If the census asks for the year when the migrant moved (or how long the person has been living in the place where counted in the second census) one can get a sense of the timing of migration, and estimate yearly migration rates. This is a complicated process and is not covered here, but the interested reader is referred to the paper by Dorrington and Moultrie (2009).

Working with total numbers only

If age-specific numbers are not available or the allocation to age is considered to be unreliable one can still produce estimates by age by estimating the total number of migrants as described below, and then apportioning this total to the age groups using either an age distribution for the same population at a different time (since the age distribution of migration flows tend be consistent over time, or (more likely) an appropriate standard model Rogers and Castro (1981a; 1981b).

$Net _{\infty} M_{0}^{F} = _{\infty} N_{0}^{F} (t + n) - _{\infty} N_{0}^{F} (t) + _{\infty} D_{0}^{F}$

where $_{\infty} D_{0}^{F} = \frac{n}{2} (_{\infty} N_{0}^{F} (t) + _{\infty} N_{0}^{F} (t + n))_{\infty} m_{0}$ and _∞m₀ is an estimate of the crude mortality rate of the population in the country of the census.

Limitations

The primary limitation of using censuses to estimate immigration and net in-migration is the quality of the census, in particular the extent of undercount of the censuses, in general but more significantly one relative to the other. However, even if the census undercount is low, the census might not identify all the migrants. In general recent migrants are often difficult to include in a census because they have yet to settle. More specifically, immigrants may not be keen to identify themselves as immigrants and either avoid being counted or do not admit to being foreign-born.

Apart from this, place of birth and/or place of residence at previous census, in the case of internal migrants, might be misreported due to boundary changes or ignorance (or even bias) on the part of the respondent.

The third drawback of census data is that it cannot be used to measure emigration from the country of the census. Emigration is particularly difficult to estimate for most countries, but one option is to apply the method for identifying net immigration of the foreigners described above to the censuses of the main countries of destination to which the emigrants move to estimate the change in the numbers of emigrants to those countries. Of course, this is only useful if the censuses of these countries identify the numbers of foreign-born by their countries of birth reasonably accurately.

Generally, statistics on immigrants and particularly emigrants that are collected at border posts provide quite poor estimates of the true numbers, unless the borders of the country are quite impenetrable and there are a few well-controlled ports of entry. Even then there may still be many ‘visitors’ who end up living in the country.

A final drawback occurs when working with data aggregated over all ages. In these cases one usually has to make use of the crude death rate for the population of the country of the census in order to estimate the number of deaths of the migrant population. However, since the distribution of the migrant population by age can differ from that of the population of the country of the census quite markedly, the estimated number of deaths can be quite inaccurate.

Extensions of the method

Some censuses ask additional questions which can be of use in interpreting the patterns of migration, if not improving the estimate of the level of migration. Most common of these is probably a question asking about when the migrant moved. These data allow one to estimate annual rates of migration, however, it possible that there could be a tendency for respondents to report moves as occurring more recently than is actually the case (Dorrington and Moultrie 2009).

Where a census asks, such as the recent censuses in South Africa, of those who moved since the previous census, where they moved from most recently and when they moved, and not where they were at the time of the previous census, it is possible to back-project the numbers of migrants by applying annual rates of migration between sub-national regions to estimate the number by place at the time of the previous census (Dorrington and Moultrie 2009). However, in the case of South Africa, at least, it appears that the assumption the most migrants moved only once in the past five years, and thus that the place of residence before the most recent move is the same as the place at the time of the previous census, is quite reasonable (Dorrington and Moultrie 2009).

Where one has data on both the sub-national region of birth and the place at the time of the previous census, one can cross-tabulate the place of residence data by the place of birth and thus be able to classify recent migrants into primary, secondary and return migrants.

Description of the methods

Data requirements and assumptions

Tabulations of data required

Important assumptions

Preparatory work and preliminary investigations

Caveats and warnings

Application of the method

A: Estimating net immigration of foreigners using place of birth data

Step 1: Decide on survival factors

Step 2: Estimate the number of deaths of the immigrants

Step 3: Estimate the net number of immigrants (of foreigners)

B: Estimating net internal migration between sub-national regions from place of birth data

C: Estimating internal migration between sub-national regions from place of residence at previous survey

Worked example

A: Estimating net immigration of foreigners using place of birth

Step 1: Decide on survival factors

Step 2: Estimate the number of deaths

Step 3: Estimate the net number of immigrants (of foreigners)

B: Estimating sub-national regional net in-migration using place of birth

C: Estimating internal migration between sub-national regions from data on place of residence at previous census

Diagnostics, analysis and interpretation

Checks and validation

Interpretation

Method-specific issues with interpretation

Scanning errors

Detailed description of method

Mathematical exposition

Internal migration

Working with total numbers only

Limitations

Extensions of the method

Further reading and references