Migration is the third process (with fertility and mortality) that governs population change. For most national populations, its contribution to population change is small relative to those of births and deaths, but as the civil division of interest becomes smaller, the salience of migration typically becomes larger. Migration differs from fertility and mortality not only in magnitude, but more fundamentally in the nature of the process. Migration involves moving across some geographicallydefined boundary, with the intent or result of changing place of normal residence. Thus whereas a birth and a death are largely unambiguous, a migration depends upon geographicallydefined spatial units (civil divisions) and on intent or subsequent behaviour. A person can be a migrant to the analyst looking at change in provincial population but not a migrant to another analyst focusing on national population change. The first task, therefore, in any analysis of migration is to establish the geographic focus of the study. A second task is to define what counts as a migration, as opposed to broader mobility. The issue is further confused by the existence of several different types of migration. In addition to “ordinary” change of usual residence, there are circular migration flows, daily or weekly commuter flows, seasonal flows and refugee flows, all with specific characteristics. Given these definitional issues, and the fact that migrations can effectively be reversed in terms of population stocks (unlike births and deaths), it is no surprise that measurement is also complicated.
Apart from this, capturing data on migration is also more problematic. Although developing countries often lack complete systems of birth and death registration, completeness is improving and some methods have been devised to make use of the less than complete data. However, registration data on migrants/migrations in most countries cannot be relied on to produce reliable estimates of immigrants, let alone of internal migrants/migrations. In addition, for various reasons (illegal status, temporary residence of recent migrants, fear of xenophobia, etc.) migrants (especially immigrants) are usually underrepresented in censuses and surveys.
Methods for measuring migration are broadly similar for both internal migration (in or outmigration) and international migration (immigration or emigration), except in one very important respect. A census or survey can measure international immigration by identifying persons born abroad, but it is much harder to identify emigrants because it is not possible to carry out a census/survey in all recipient countries. Approaches to estimating emigration include: (i) systematic identification of nationals in censuses of other countries (UN Population Division 2011); (ii) including census/survey questions about usual household members living abroad (e.g. in the Swaziland Censuses of 1986 and 1996); (iii) asking about the residence abroad of close relatives, especially a woman’s children or a respondent’s siblings (Zaba 1985); and (iv) using intercensal residual methods to estimate numbers of missing residents at the time of a second census. The first approach is dependent on receiving countries having, and being willing to share, relevant data and only captures migration of the nativeborn population; the second approach depends on the, perhaps vague, concept of household membership, and will also fail to cover entire households that have moved away; the third also fails to capture entire missing families, does not provide estimates of recent emigration, and in small experimental surveys has not proven convincing. Only the fourth can be expected to give plausible estimates of recent outflows, provided both censuses count the population reasonably accurately, but gives no potentially useful information about destination.
With these limitations and problems of accurate data collection, the field of migration analysis has developed largely independently from mainstream demography, leading to it concentrating primarily on developed countries where the quality of data available to measure migration is typically much better than it is in developing countries, and possibly because migration in these countries is often a matter of greater political and public policy concern. A further consequence of these factors is that the field has developed its own terminology and techniques, which are often quite far removed from the demography discussed elsewhere in this manual.
As noted above, a migration is defined as a move across a geographicallydefined (usually administrative) boundary of interest to the analyst with the effect of changing a person’s place of usual residence. Assuming that the boundary can be clearly defined, this immediately raises two questions: how does one define usual place of residence, and how does one determine whether it has changed? Unfortunately, no very precise answers can be given to these two questions, giving rise to inevitable uncertainty in measurement. The preferred definition of usual residence is in terms of length of residence: that if one intends to live, or after one has lived, in a place for a period of time (e.g. one year) one becomes a usual resident. Note that usual residence is not the same thing as legal residence. The Principles and Recommendations for Population and Housing Censuses (UN Statistics Division 2008: 102, para. 1.463) defines usual residence as follows:
“It is recommended that countries apply a threshold of 12 months when considering place of usual residence according to one of the following two criteria:
(a) The place at which the person has lived continuously for most of the last 12 months (that is, for at least six months and one day), not including temporary absences for holidays or work assignments, or intends to live for at least six months;
(b) The place at which the person has lived continuously for at least the last 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months.”
However, this definition does not deal with the situation of a person with two homes who regularly spends about six months in each. In general, we have to rely on people to selfdefine as residents or not, although some tests could be implemented (such as asking where their car is registered, where taxes are paid, where they voted, where the person sleeps at night on a regular basis, etc.). For most purposes, a person can distinguish between whether he or she is a usual resident and visitor, and this simple distinction suffices.
Migration has been the Cinderella of demography, kept in the background as far as possible, and dedicated migration surveys are few, far between, and specialized (an excellent example is the description of the Mexican Migration Project by Massey, Alarcon, Durand et al. (1987)). Dedicated migration surveys typically include full migration histories, which, though raising complex analytical issues, tend not to be focussed on the estimation of numbers of migrants/migrations. In this section we do not cover the analysis of such full histories (there are very few general principles that would apply to a useful number), but rather deal with the sorts of data collected by population censuses and general household surveys and sometimes, developed countries, by some form of registration.
The most widely collected data relevant to migration is place of birth. In comparison with place of residence at the time of a survey, this information describes lifetime migration. The information provides limited information about timing of migration, and is ‘net’ migration in the sense that it misses, entirely, migrations that have been reversed (back to the place of birth) and all intermediate migrations. At the time of data collection, decisions have to be taken about the granularity of the data: i.e., for those born abroad, how many countries should be explicitly recorded and for those born in the country, what level of geography should be recorded. For the analyst, of course, these decisions were made at the questionnaire design stage, but some degree of greater aggregation may be required. The analysis of data on birthplace is described below, but it is useful to make two points here. First, if data on birthplace by age and sex are available for two points in time, it is possible to estimate net migration (by age and sex) during the interval. Second, although birthplace reflects lifetime migration, the length of “lifetime” varies by age, and (provided the census data on children is reasonably accurate, which it often isn’t in many developing countries) the migration of 04 year olds may be used as an indicator for recent migration of their parents (Raymer and Rogers 2007).
This information is very often collected in addition to that on birthplace, with the express objective of providing data on recent migration. The time point specified is generally five years earlier, but sometimes a one year period is used. However, it tends to work better if the time point is associated with a memorable event, such as the previous census, on the assumption that the coverage of that previous census was largely complete (so that people remember being counted). The longer time period identifies more migrants, but misses intermediate moves, whereas the shorter time period is more susceptible to reference period error (I moved “about a year ago”).
This information is almost always collected as an alternative to residence at some specified time in the past, and is generally combined with an additional question about duration of current residence (or date of last move). The objective again is to provide data on recent migration.
The question refers to duration of residence in the civil division (such as a town or province), not in an individual dwelling unit. This question is of limited use on its own and tends to be paired with the one above to provide a time frame for estimates.
Though not involving a direct question about migration, intercensal population change by age and sex can, provided both censuses are reasonably accurate counts of the population, provide residual estimates of net migration between the two censuses (Hill 1987; Hill and Wong 2005; UN Population Division 1967). Intercensal population change (for cohorts or age groups) by age and sex is adjusted for the effects of intercensal fertility and mortality to provide a residual estimate of intercensal net migration (i.e., treating migration as the balancing item in the fundamental demographic balance equation). Migration is generally concentrated in the age range 20 to 40, ages at which mortality rates are, at least in the absence of HIV/AIDS, relatively low and fertility irrelevant, so residual migration estimates are insensitive to assumptions about fertility and mortality (except in populations severely affected by HIV/AIDS where using these data to estimate migration is not recommended). Such estimates are extremely sensitive, however, to even small changes in census coverage; such errors may be manifest in high agespecific migration rates over age 50, where migration is usually low.
It is not the purpose of this introduction to provide a comprehensive summary of all the measures and definitions – the interested reader is referred to the UN manual on internal migration (UN Population Division 1970) – but two are of particular importance for the chapters that follow.
Stocks of migrants are typically thought of as numbers of persons (by age group and sex) not born in the civil division of enumeration. The proportions born elsewhere (in the country or in other countries) give a good general sense of the magnitude of inmigration and immigration, but no sense of any dynamic changes that may have occurred recently. However, changes in stocks can be used to estimate immigration (net of any onward or return migration of the foreignborn).
Assuming that migration events can be fully and accurately identified, occurrence/exposure rates can be calculated for outmigration or emigration in exactly the same way as for mortality, dividing events in a period by exposure time; such rates can be crude (both sexes, all ages) or agesex specific. The same is not the case (or at least not usefully) for inmigration or immigration, since the population exposed to the risk of migrating into a civil division is the entire population of the world living elsewhere. Inmigration and immigration rates are always calculated by dividing events by the exposure time of the one population group not exposed to risk, the current residents; such rates can be crude (both sexes, all ages) or agesex specific. Defining rates in this way has the advantage of satisfying the needs of the demographic balancing equation, since rates of gain and loss are measured relative to the same population. This confers a further advantage in that net migration rates can be estimated from the demographic balancing equation as population change between two time points (e.g. censuses) minus gains due to births in the interval plus losses due to deaths in the interval. However, this approach does have the disadvantage of removing the scale limits on “normal” occurrence/exposure rates; for example, at the extreme, a person moving into a previously unoccupied civil division creates an inmigration rate of infinity.
The chapters in this section focus on the estimation and quantitative description of immigration and internal in and outmigration. They are not meant to provide comprehensive coverage of all measures of migration, and specifically they do not cover the important, but problematic, issue of measuring emigration (other than by mentioning that the method of estimating immigration (net of return/onward migration) of foreigners, can be applied to the data of the main countries of destination of emigrants to get some sense of the age profile and magnitude of emigration.
Chapter 35 concentrates on the basic methods of using data from censuses to estimate the numbers (net of return/onward migration) of immigrants from the change in stock of foreigners, and of internal in and outmigration from the change in stock by place of birth and from the place of residence at some date prior to the census.
Chapter 36 describes the selection and fitting of one of the RogersCastro multiexponential models to estimates of migration probabilities (or rates) derived from estimates of the number of migrants/migrations using nonlinear optimisation procedures.
Chapter 37 describes the multiplicative and loglinear models for capturing, comparing and analysing the mass of interregional migration flows from places of origin to places of destination. The chapter also provides an introduction to the method of offsets for extending the use of these models to estimate interregional flows from marginal flows (i.e. total flows out of, or into, regions). The intention is to expand the material on the method of offsets into an additional chapter at a later date, which will be placed on the Tools for Demographic Estimation website.
As mentioned above, UN Manual VI (UN Population Division 1970) provides a comprehensive, if dated, introduction to the description and measurement of internal migration. Those looking for an overview of indirect methods of estimating migration are referred to the useful, if also somewhat dated, review by Zaba (1987). More specifically, Hill (1987) attempted to apply the logic underlying the Generalized Growth Balance method of adult mortality estimation (described in Chapter 24) to estimate undocumented migration, while Hill and Queiroz (2010) sought to estimate net migration in parallel with the estimation of mortality. Unfortunately neither method has proved to be particularly successful.
Those interested in reading more about the models of migration (multiexponential, multiplicative and loglinear) or the method of offsets are referred to work by Rogers, Willekens and colleagues (e.g. Little and Rogers (2007), Raymer and Rogers (2007), Rogers (1980, 1986) and Willekens (1999)).
Hill K. 1987. "New approaches to the estimation of migration flows from census and administrative data sources", International Migration Review 21(4):12791303. http://dx.doi.org/10.2307/2546515 [1]
Hill K and B Queiroz. 2010. "Adjusting the general growth balance method for migration", Revista Brasileira de Estudos de População 27(1):720. doi: http://dx.doi.org/10.1590/S010230982010000100002 [2]
Hill K and R Wong. 2005. "Mexico–US migration: Views from both sides of the border", Population and Development Review 31(1):118. doi: http://dx.doi.org/10.1111/j.17284457.2005.00050.x [3]
Little JS and A Rogers. 2007. "What can the age composition of a population tell us about the age composition of its outmigrants?", Population, Space and Place 13(1):2319. doi: http://dx.doi.org/10.1002/psp.440 [4]
Massey DS, R Alarcon, J Durand and H Gonzalez. 1987. Return to Aztlan: The Social Process of International Migration from Western Mexico. Berkeley and Los Angeles: University of California Press.
Raymer J and A Rogers. 2007. "Using age and spacial flow structures in the indirect estimation of migration streams", Demography 44(2):199–223. doi: http://dx.doi.org/10.1353/dem.2007.0016 [5]
Rogers A. 1980. "Introduction to multistate mathematical demography", Environment and Planning A 12:489498. doi: http://dx.doi.org/10.1068/a120489 [6]
Rogers A. 1986. "Parameterized multistate population dynamics and projections", Journal of the American Statistical Association 81(393):4861. doi: http://dx.doi.org/10.1080/01621459.1986.10478237 [7]
UN Population Division. 1967. Manual IV: Methods for Estimating Basic Demographic Measures from Incomplete Data. New York: United Nations, Department of Economic and Social Affairs, ST/SOA/Series A/42. http://www.un.org/esa/population/techcoop/DemEst/manual4/manual4.html [8]
UN Population Division. 1970. Manual VI: Methods of Measuring Internal Migration. New York: United Nations, Department of Economic and Social Affairs, ST/SOA/Series A/47. http://www.un.org/esa/population/techcoop/IntMig/manual6/manual6.html [9]
UN Population Division. 2011. International Migration Report 2009: A Global Assessment. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/Series A/316. http://www.un.org/esa/population/publications/migration/WorldMigrationReport2009.pdf [10]
UN Statistics Division. 2008. Principles and Recommendations for Population and Housing Censuses v.2. New York: United Nations, Department of Economic and Social Affairs, ST/ESA/STAT/SER.M/67/Rev2. http://unstats.un.org/unsd/publication/SeriesM/Seriesm_67rev2e.pdf [11]
Willekens FJ. 1999. "Modeling approaches to the indirect estimation of migration flows: From entropy to EM", Mathematical Population Studies 7:239278. doi: http://dx.doi.org/10.1080/08898489909525459 [12]
Zaba B. 1985. Measurement of Emigration Using Indirect Techniques: Manual for the Collection and Analysis of Data on Residence of Relatives. Liège: Belgium: Ordina Editions.
Zaba B. 1987. "The indirect estimation of migration: A critical review", International Migration Review 21(4):1395–1445. doi: http://dx.doi.org/10.2307/2546519 [13]
Estimating migration from census data is not technically complicated. Provided that the census(es) gather the appropriate information and are reasonably accurate it is possible to produce estimates of net immigration (i.e. immigration less emigration) of the foreignborn population (people born outside a particular country) and internal migration between (to and from) subnational regions of a country, over the period between two censuses.
To estimate net immigration of foreigners one essentially subtracts from the number of foreignborn people enumerated in a census, the number of foreigners expected to have survived since being enumerated in the previous census.
In a similar way, if the censuses record the subnational region of birth one can estimate net inmigration (i.e. net inmigration of those born outside the region less net outmigration of those born in the region) between subnational regions of a country. However, if the census asks of people where they were living at some prior point in time, say at the time of the previous census, one is able to estimate directly the number of surviving migrants (i.e. migrants still alive at the time of the latest census) into and out of each subnational region of the country since that prior point in time.
In order to estimate the number of migrants from the number of surviving migrants at the time of the second census one needs to add to these figures an estimate of the number of migrants who are expected to have died between moving and the time of the latest census.
If the latest census records other information such as year in which the migrant moved to the place at which the person was counted in the census, it is possible also to establish a trend of migration over time.
Migration is different from fertility and mortality both in that migrating is not final in the sense of a birth or death, but also that we are concerned not only with the population of origin, from which the migrant moved (which corresponds to a population exposed to the risk from which rates of migration akin to those of fertility and mortality can be calculated) but we also have a population to which the migrant moves, the destination population. Apart from this, in order to understand migration one is often interested in distinguishing between different types of migration (whether temporary or more permanent, whether circulatory or unidirectional, etc.). For these reasons there is a much wider range of measures and terminology associated with migration than there is with either fertility or mortality. It is not the purpose of this chapter to cover these issues and the interested reader is referred to the standard texts on the subject such as the UN Manual VI (UN Population Division 1970), Shryock and Siegel (1976), Siegel and Swanson (2004).
 Censuses identify all foreignborn people accurately
 One is able to estimate the mortality of the foreignborn population accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the nativeborn (locallyborn) national population)
 No return migration of locally born emigrants
 Censuses count the population by subnational region accurately and identify the region of birth accurately
 One is able to estimate the mortality of people moving between two regions accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the nativeborn national population).
 Latest census identifies correctly all people who have moved from one region to another since the prior date (e.g. previous census)
 One is able to estimate the mortality of people moving between two regions accurately (either that the life table used is appropriate, or that the mortality is the same as that implied by the censuses for the nativeborn national population). Since one is estimating in and outmigration separately (as opposed to net migration) this assumption is of less importance.
Before applying this method, you should investigate the quality of the data in at least the following dimensions
Estimating migration using place of birth data from two censuses not only requires that the censuses count the population reasonably completely, but that the place of birth be accurately recorded. Often this is not the case, particularly when estimating immigration, where immigrants wish to hide the fact that they are foreign, but also in the case of internal migration where there may have been boundary changes or the respondent is ignorant about the place of birth of the person.
Estimating migration by asking questions of migrants is quite dependent on the census identifying completely all those who have migrated, as well as identifying the place from which moved correctly. To the extent that recent migrants are not yet established as residents of the region to which they have moved at the time of the census, they could be missed in the count.
Net migration, by definition, underestimates the flows of migrants into and out of a region or country. Thus, for example, people who moved into a region and then returned within the period being considered will result in zero net inmigration and yet moved twice.
This method produces estimates of the net immigration of foreigners using place of birth data. It is important to stress that this method does not take into account or measure the immigration of returning nativeborn people who left the country prior to the previous census and returned before the second census. Thus this method is not recommended for the measurement of immigration where significant return migration of nativeborn people (for example, after exile or forced migration of refugees) is in progress.
If data on the number of foreignborn people in the population are available by age group for each census then one needs to estimate the survival factors to be applied to the numbers of foreignborn in the first census to estimate the numbers surviving to the time of the second census. The user can choose between years of life lived in fiveyearly age groups (_{5}L_{x}) based on the standard from the General family of United Nations model life tables or one of any of the four families of Princeton model life tables or a model life table of a population experiencing an AIDS epidemic (Timæus 2004) which appear in the Models spreadsheet of the associated workbook. This spreadsheet also allows the user to input years of life lived in fiveyearly age groups of an alternative life table if there is reason to assume that the life table has a similar pattern of mortality to that of the population in question, or failing this, the survival factors can be derived from the proportion of each fiveyear age group of the nativeborn population surviving from the first to the second census (assumed to be n years apart, where n is a multiple of 5). Thus
and
, the nyear survival factor for a group of people aged x to x + 5 at the previous census, An and older at the previous census, and born between censuses, respectively are estimated as follows:
where the superscript nb represents ‘nativeborn’,
represents the nativeborn population in the census at time t and B^{nb} represents the number of nativeborn births between time t and t + n.
If the data are not available in fiveyear age groups, the net number of immigrants can still be estimated in total provided we have an estimate of the crude death rate for the population (which might, in the absence of any evidence to the contrary, be assumed to be that of the nativeborn population).
If data on the number of foreignborn people in the population are available by age group for two censuses (n years apart) then one needs to estimate the number of deaths of foreignborn people (denoted by the superscript F) aged between x and x+5 at the first census (at time t),
, aged An and older at the first census,
, and those born between the censuses,
, as follows:
where
represents the number of foreignborn people according to the census at time t who were aged between x and x+5.
If data and/or survival factors are not available by age group then one can estimate the total number of deaths of the foreignborn people as follows:
where _{∞}m_{0} is an estimate of the crude mortality rate of the population in the country of the census.
However, if the age distribution of the foreignborn population is markedly different from that of the population in the country of the census, then this can produce a poor approximation to the true number of deaths.
If data are available by age group for each census then agespecific net immigration can be estimated as follows:
for x = 0, 5, … , A5n where
represents the net number of immigrants between times t and t+n who were aged between x and x + 5 at time t. For x > A  5  n
The net number of immigrants of those born between times t and t+n is estimated as follows:
If data and/or survival factors are not available by age group then one would estimate of the total net number of immigrants as follows:
Net inmigration into a particular subnational region from other regions in the country can be estimated in exactly the same way as the international immigration, described above, by replacing the foreignborn population with the population born outside the region.
In addition, applying the same method to data on the change in the numbers of population born in (rather than outside) and living outside the region of interest allows us to estimate the net outmigration of those born in the region to other regions in the country. Subtracting this from the net inmigration of those born outside the region gives an estimate of the overall net inmigration into the region of interest.
If there is reason to suspect that there is a material difference in the mortality experienced by those born outside who moved into the region and those born in the region who moved out, and one has appropriate survival factors then one could apply different survival factors to each when estimating the net number of migrants. However, in practice it is likely that inaccuracies in the census data on place of residence at previous census are likely to outweigh any increase in accuracy achieved by using differential mortality.
Net subnational interregional migration is estimated directly from the numbers of people in each region at the time of the census who moved since the previous census by place (e.g. region) they were in at a given prior date (e.g. at the time of the previous census). Confining the estimates to interregional flows the sum of the numbers of interregional inmigrants should be equal to the sum of interregional outmigrants; however, if the data include immigration to the subnational regions from outside the country one can extend the estimates of inmigration to include international immigration into each region.
Since one of the major areas of interest is the magnitude of interregional flows of the population, one is as interested in the total numbers of migrants between regions as one is in the age distributions of particular flows.
The number of migrants is derived from the number of surviving in and outmigrants as follows:
where the superscript (’) represents numbers surviving and _{5}I’_{x} and _{5}O’_{x} respectively represent the number of surviving inmigrants into, and the surviving number outmigrants from, a particular region at the time of the second census who were aged between x and x+5 at the second census.
This example uses data on the numbers of males in the population from the South African Census in 2001 and a ‘census replacement survey’, the Community Survey in 2007. (Although the survey was conducted approximately 5.35 years after the night of the census in 2001, it is assumed for the purposes of presentation here to have been exactly five years after the census in 2001.) The examples appear in the Migration_South Africa_males.xlsx workbook.
The survival factors are shown in the fifth column of Table 1. The values are derived from (the years of life lived in each age group of) the alternative life table entered in the Models spreadsheet, for those aged 20 to 24 last birthday and those aged 80 and over at the time of the first census, and those born between the two censuses, as follows:
Table 1 Estimation of deaths of foreignborn and the net number of immigrants by age group, South Africa, 20012006
Age 
2001 
2006 
x 
_{5}S_{x} 
Age at 2^{nd} census 
D^{F} 
Net M 



B 
0.94151 



0 4 
8,963 
12,577 
0 
0.97896 
0 4 
391 
12,968 
5 9 
10,390 
13,724 
5 
0.99547 
5 9 
242 
5,003 
1014 
13,508 
13,998 
10 
0.99427 
1014 
55 
3,664 
1519 
27,835 
27,943 
15 
0.98602 
1519 
119 
14,555 
2024 
69,787 
59,493 
20 
0.96458 
2024 
616 
32,275 
2529 
87,381 
95,763 
25 
0.93161 
2529 
2,994 
28,970 
3034 
73,338 
100,450 
30 
0.90960 
3034 
6,675 
19,743 
3539 
66,663 
85,490 
35 
0.89780 
3539 
7,563 
19,715 
4044 
59,152 
75,684 
40 
0.89092 
4044 
7,701 
16,721 
4549 
45,184 
66,113 
45 
0.88633 
4549 
7,274 
14,234 
5054 
40,398 
55,913 
50 
0.87224 
5054 
6,154 
16,883 
5559 
30,640 
42,833 
55 
0.84731 
5559 
5,717 
8,153 
6064 
24,376 
34,433 
60 
0.80885 
6064 
5,442 
9,234 
6569 
17,895 
25,588 
65 
0.75468 
6569 
5,353 
6,564 
7074 
13,561 
18,989 
70 
0.66991 
7074 
5,281 
6,375 
7579 
10,238 
12,850 
75 
0.56388 
7579 
5,404 
4,693 
8084 
7,658 
7,461 
80+ 
0.40912 
8084 
5,118 
2,341 
85+ 
4,455 
5,305 


85+ 
7,410 
602 
Total 
611,423 
754,608 


Total 
79,509 
222,693 
Since we have data on the number of foreignborn people in the population by age group for each census we can estimate the number of deaths of foreignborn people which occurred in the period between the two censuses by age group using the numbers of foreigners in each census given in the second and third columns of Table 1. For those aged 20 to 24 last birthday and those aged 80 and over at the time of the first census, and those born between the two censuses, the calculations are as follows:
If data and/or survival factors were not available by age group then one could estimate the total number of deaths of the foreign born people as follows, given an estimate of the crude mortality rate in the population of 14 per 1,000:
Since data are available by age group for each census, agespecific net immigration of those born outside the country can be estimated as follows:If data and/or survival factors were not available by age group then one could estimate the total net number of immigrants as follows:
If data and/or survival factors were not available by age group then one could estimate the total net number of immigrants as follows:
The second and third column of Table 2 show the numbers of people living in the Western Cape province of South Africa who were born outside the province, as counted by the 2001 Census and the 2007 Community Survey, respectively. Although the same survival factors (column 5) have been used as were used in the example of Method A, this should not be the case if it was thought that the mortality experience of nativeborn and immigrants were very different. The final column of Table 2 gives the net numbers of migrants into the Western Cape who were born in provinces other than the Western Cape for the different age groups. Thus in total 213,911 people born outside the Western Cape moved to the Western Cape (after excluding those who moved out).
Table 2 Estimation of the net number of inmigrants of those born outside by age group, Western Cape, South Africa, 20012006
Age 
2001 
2006 
x 
_{5}S_{x} 
Age at 2^{nd} census 
D_{O} 
Net M (born out) 



B 
0.94151 



0 4 
16,443 
19,012 
0 
0.97896 
0 4 
591 
19,602 
5 9 
24,406 
28,743 
5 
0.99547 
5 9 
482 
12,782 
1014 
31,134 
30,792 
10 
0.99427 
1014 
125 
6,511 
1519 
44,478 
53,933 
15 
0.98602 
1519 
245 
23,043 
2024 
74,011 
82,526 
20 
0.96458 
2024 
896 
38,944 
2529 
80,187 
89,522 
25 
0.93161 
2529 
2,954 
18,466 
3034 
65,833 
90,783 
30 
0.90960 
3034 
6,074 
16,670 
3539 
56,393 
76,475 
35 
0.89780 
3539 
6,776 
17,417 
4044 
44,420 
59,692 
40 
0.89092 
4044 
6,268 
9,567 
4549 
32,862 
47,612 
45 
0.88633 
4549 
5,338 
8,529 
5054 
28,178 
37,969 
50 
0.87224 
5054 
4,303 
9,409 
5559 
19,983 
30,205 
55 
0.84731 
5559 
4,012 
6,039 
6064 
17,569 
25,593 
60 
0.80885 
6064 
3,832 
9,442 
6569 
11,216 
20,802 
65 
0.75468 
6569 
4,137 
7,371 
7074 
8,365 
12,612 
70 
0.66991 
7074 
3,426 
4,822 
7579 
5,919 
8,434 
75 
0.56388 
7579 
3,458 
3,528 
8084 
4,063 
5,061 
80+ 
0.40912 
8084 
3,248 
2,390 
85+ 
2,152 
2,183 


85+ 
3,413 
620 
Total 
567,613 
721,949 


Total 
59,576 
213,911 
The second and third columns of Table 3 present the numbers of people living in provinces other than the Western Cape who were born in the Western Cape, as counted by the 2001 census and the 2007 Community Survey, respectively. The net number of outmigrants of those born in the Western Cape (i.e. the number of people born in the Western Cape who moved out, less those who have returned) is given in column 8. The negative numbers mean that there was negative net outmigration (i.e. the number of those born in the Western Cape who moved to other provinces in the period was less than the number born in the Western Cape who were living outside who returned during the period). Thus the total of 19,017 means that the number of people born in the Western Cape, who returned to the Western Cape during the period having lived in another province until 2001 exceed those who were born in the Western Cape and moved to another province in the period by 19,017.
These estimates were derived using the same survival factors as were used for those born outside the Western Cape who moved into the province, but if there was reason to suppose that the mortality differed for those born in the Western Cape who moved out, then a different set of survival factors would be used to estimate the Net M (born in) numbers.
The overall net inmigration for the province is thus given in the final column of Table 3. Thus in total 232,928 more people moved into the Western Cape than left the Western Cape to live in another province.
In this example those born outside the province include those born outside the country and thus the overall net migration includes immigrants who settle in the province. Excluding the foreignborn from Table 2 would produce numbers of internal inmigrants net of internal outmigrants, and the sum of these numbers for all the provinces together would be zero.
Table 3 Estimation of the net number of outmigrants of those born inside by age group, Western Cape, South Africa, 20012006
Age 
2001 
2006 
x 
_{5}S_{x} 
Age at 2^{nd} census 
D_{I} 
Net M (born in) 
Overall Net M 



B 
0.94151 




0 4 
22,055 
11,747 
0 
0.97896 
0 4 
365 
12,112 
7,490 
5 9 
21,895 
12,509 
5 
0.99547 
5 9 
367 
9,180 
21,962 
1014 
21,382 
11,593 
10 
0.99427 
1014 
76 
10,226 
16,737 
1519 
18,265 
13,455 
15 
0.98602 
1519 
100 
7,827 
30,870 
2024 
14,645 
10,477 
20 
0.96458 
2024 
202 
7,587 
46,531 
2529 
13,501 
9,534 
25 
0.93161 
2529 
434 
4,676 
23,142 
3034 
13,118 
11,047 
30 
0.90960 
3034 
867 
1,587 
18,257 
3539 
12,121 
14,614 
35 
0.89780 
3539 
1,319 
2,815 
14,602 
4044 
11,725 
12,195 
40 
0.89092 
4044 
1,311 
1,384 
8,183 
4549 
10,335 
10,538 
45 
0.88633 
4549 
1,285 
98 
8,431 
5054 
9,211 
9,881 
50 
0.87224 
5054 
1,221 
768 
8,642 
5559 
7,264 
10,568 
55 
0.84731 
5559 
1,362 
2,720 
3,319 
6064 
6,691 
7,723 
60 
0.80885 
6064 
1,250 
1,710 
7,732 
6569 
4,643 
5,297 
65 
0.75468 
6569 
1,265 
128 
7,499 
7074 
3,954 
3,766 
70 
0.66991 
7074 
1,182 
304 
4,517 
7579 
2,331 
2,384 
75 
0.56388 
7579 
1,240 
330 
3,858 
8084 
1,402 
2,140 
80+ 
0.40912 
8084 
1,336 
1,145 
1,244 
85+ 
707 
555 


85+ 
1,024 
531 
89 
Total 
195,246 
160,023 


Total 
16,206 
19,017 
232,928 
Table 4 presents the results of the answers to the question about place (province in this example) of residence at the time of the 2001 Census given by those counted in each of the provinces in the 2007 Community Survey. (In actual fact the question asked whether the person was staying at the same place at the time of the prior census and if not, where they were staying at the time they moved to the place at which they were counted in the Community Survey. However, work by Dorrington and Moultrie (2009) shows that using these data and the year of movement to back project the population in order to estimate the numbers by province of residence at the time of the previous survey suggests that the assumption that there was only one move in the five years since the previous census was reasonably accurate.)
By far the largest numbers of migrants are those that moved within each of the provinces, however, these have been excluded from Table 4 because one is usually more interested in interprovincial migration than migration within a province.
Table 4 Interprovincial migration, South Africa, 20012006

Province where counted (destination) 


Previous residence (origin) 
WC 
EC 
NC 
FS 
KZ 
NW 
GT 
MP 
LM 
Total 
WC 

12,173 
4,060 
1,745 
3,221 
2,113 
16,400 
1,405 
874 
41,992 
EC 
52,239 

1,120 
7,187 
25,209 
14,430 
28,633 
4,693 
2,116 
135,626 
NC 
4,813 
1,942 

3,480 
908 
3,728 
4,956 
1,062 
357 
21,246 
FS 
2,943 
3,145 
2,546 

2,352 
12,733 
19,920 
4,293 
1,963 
49,896 
KZ 
6,762 
7,015 
631 
2,358 

3,573 
50,980 
8,886 
1,194 
81,399 
NW 
1,478 
907 
9,811 
5,555 
2,329 

47,633 
3,090 
4,337 
75,140 
GT 
24,891 
12,948 
3,962 
11,437 
18,145 
32,433 

18,598 
15,133 
137,547 
MP 
2,134 
1,317 
280 
1,724 
4,546 
5,767 
42,941 

8,628 
67,338 
LM 
2,754 
1,583 
255 
1,709 
2,209 
9,773 
81,394 
24,211 

123,889 
OSA 
21,221 
5,467 
1,209 
9,584 
10,933 
11,437 
51,873 
8,335 
9,286 
129,346 
DNK 
500 
3 
15 
124 
132 
78 
228 
89 
0 
1,170 
UNS 
1,058 
1,029 
107 
208 
875 
508 
3,558 
408 
633 
8,384 
Total 
120,794 
47,528 
23,996 
45,111 
70,860 
96,573 
348,516 
75,070 
44,524 
872,973 
WC = Western Cape, EC = Eastern Cape, NC = Northern Cape, FS = Free State, KZN = KwaZuluNatal, NW = North West, GT = Gauteng, MP = Mpumalanga, LM = Limpopo, OSA = Outside SA, DNT = Do not know, UNS = Unspecified 
In addition to the allage numbers in Table 4 (in actual fact these numbers exclude, as is often the case, migration of those born between the census and survey) one can also produce numbers of in and outmigration by age groups as shown in Table 5. For completeness these numbers include estimates of the number of migrants who were born since the previous census. However, relative to the other migrants these numbers look implausibly high, and the reason for this is discussed below.
The net number of migrants is estimated for those aged 2529 at the time of the Community Survey (i.e. were aged 2024 at the time of the 2001 census), for example, as follows:
Table 5 Estimation of the net number of inmigrants by age group, Western Cape, South Africa, 20012006
Age 
Surviving in migrants (I’) 
Surviving out migrants (O’) 
x 
_{5}S_{x} 
Net inmigrants 






0 4 
20,846 
11,747 
B 
0.94151 
9,381 
5 9 
6586 
3,554 
0 
0.97896 
3,065 
1014 
6685 
2,882 
5 
0.99547 
3,812 
1519 
10402 
3,967 
10 
0.99427 
6,454 
2024 
21266 
4,488 
15 
0.98602 
16,897 
2529 
20675 
5,649 
20 
0.96458 
15,301 
3034 
15584 
6,008 
25 
0.93161 
9,928 
3539 
10584 
5,098 
30 
0.90960 
5,758 
4044 
7264 
3,045 
35 
0.89780 
4,458 
4549 
4648 
2,714 
40 
0.89092 
2,053 
5054 
3095 
1,500 
45 
0.88633 
1,698 
5559 
3940 
935 
50 
0.87224 
3,225 
6064 
3776 
527 
55 
0.84731 
3,541 
6569 
3127 
818 
60 
0.80885 
2,582 
7074 
1540 
437 
65 
0.75468 
1,282 
7579 
561 
206 
70 
0.66991 
442 
8084 
797 
116 
75 
0.56388 
944 
85+ 
264 
47 
80+ 
0.40912 
374 
Total 
141,640 
53,739 


91,194 
Perhaps the simplest check, on the reasonableness of the ‘shape’ (i.e. distribution of the numbers by age) of the estimates but not the level, is to see if it conforms to the standard shape (or a variation thereof). Rogers and Castro (1981a; 1981b) point out that the distribution of the number (or rate) of in and outmigrants tends to conform to standard patterns, with a peak in the young adult ages (usually associated with seeking employment), a second, usually less pronounced peak amongst very young children falling to a trough amongst young teenagers (the size depending on the extent to which it is families rather than individuals moving in the young to middle aged adults). Sometimes there is also a ‘hump’ (or trough) around retirement age if there is a strong flow of migrants moving to (or away from) the place to retire.
These patterns (not necessarily the same pattern) apply to in and outmigration flows separately, but not necessarily to net migration (which is the difference between the two flows) unless one flow (either the inmigration or the outmigration) is much greater than the other.
Figure 1 illustrates this using some of the estimates calculated above, expressed as proportions of the total number in each case (to allow them to be presented on a single figure). From this we can see that in broad terms (with the exception in some cases, where the proportion of migrants at the very young ages looks implausibly high) each conforms to the expected shape.
The net outmigrants of those born in the Western Cape (excluded from the figure for ease of illustration) does not conform to a standard model of migration, which could indicate these numbers are not very reliable, however, they are small relative to the inmigration of those born outside the province, and thus such a deviation may tolerated. In addition to this there are two other features to be noted from Figure 1. The first is that the outmigration from the Western Cape as estimated from data on place of residence at previous census, suggests that adult outmigrants peak at a somewhat older age (and possibly are likely to represent family rather than individual migration). The second is the fact that the net immigration into the country follows the standard shape which indicates that the flow into the country is much stronger than the return flow of those migrants.
If the census asked place of birth and place of residence at the previous census then one can compare the two estimates of net inmigration into a specific subnational region. If they are similar this gives one some confidence in the results. In the case of the place of birth data for South Africa the net number of inmigrants into the Western Cape is 232,928 (Table 3) while the estimate from the data on place of residence at the time of the previous census data produced an estimate of 92,194 (Table 4), which suggests that one or both of these sets of data are suspect.
The most basic check of the estimates of migration is to project the population (of the country or the province) at the first census to the time of the second census making use of the estimates of the number of migrants and compare that with the census estimates from the second, more recent, census to see how well the two match, especially in the age range in which migration is concentrated. In the case of the net inmigration into the Western Cape, projecting the population forward from 2001 using the estimates derived from the change in the numbers by place of birth produced a much closer fit to the population in the 2029 year age range, suggesting that the data on place of birth are probably more complete than those on the place of residence at the date of the previous census. To some extent this is supported by a comparison of the change in the number of foreignborn in the country between the two censuses, 222,693 (Table 1) with the sum of the numbers who reported that they had moved from outside South Africa to one of the provinces since the previous census, 129,346 (Table 4).
Ideally, if one had independent estimates of the number of migrants one might compare those numbers against estimates using the above methods. Unfortunately, reliable independent estimates are rare. Although most countries try to record people entering and leaving the country, these data are often not reliable, particularly in developing countries with relative porous borders. And unless the country is extremely well regulated and maintains a complete and accurate register of the population, the only other way to measure internal migration is through migrationspecific surveys, which tend to be much more useful for understanding the type of migration (whether permanent, temporary, cyclical, etc.) than for producing reliable estimates of the number of migrants, given the often less structured situation that (particularly recent) migrants find themselves living in and an understandable reluctance to identify themselves as being migrants.
Considering the numbers of migrants estimated from the data on place of residence at the previous census given in Table 4 (and taking into account the suspicion that these probably underestimate the true migration), some 24% of the population changed province of residence in the 5 years between the 2001 Census and the Community Survey. Had we included the number who moved within, but did not change, province then between 7 and 15 per cent of the population moved in the 5‑year period.
The main provinces of destination are Gauteng (by a big margin) and Western Cape, which are predominantly urban and the wealthiest provinces. The main provinces of origin are Gauteng (inspection of the age distribution would show that this is mainly return migration of ‘retiring’ workers) Eastern Cape and Limpopo, which are poor, mainly rural provinces, from which people seeking work migrate to the urban areas.
It appears that migration is predominantly of individuals (seeking work) rather than of families.
A particular feature of the data relying on province of birth is the apparently relatively high number of children born since the first census who have moved to another province. In all likelihood this is an artefact of the data capturing process. Scanning was used to capture the data from the questionnaires on which Western Cape was coded as a “1”, written in the appropriate space by hand. It appears that in a small percentage of cases the scanner might have had trouble distinguishing a handwritten “1” from a handwritten “7” (the code for Gauteng). The result of this is, for example, that some of the children coded as having been born outside the province in which they were counted, and thus appear to be migrants, but probably were not. Even though the percentage error in scanning is very small, the number of births can be large relative to the number migrants, and thus the error can produce noticeable errors. Since an increasing number of developing countries are using scanning to capture data, this sort of problem may be quite common.
Where scanning errors or other situations make it impossible to produce reliable estimates of the number of migrants of those born since the previous census one can use CWR from second census as follows:
for those born in the most recent five years, and
for those born in the five years before that if the censuses are 10 years apart, where CWR_{x} represents ratio of the number of children aged between x and x+5 to the number of women in the population aged between 15+x and 45+x in the population (regional or national) at the time of the second census, and
represents the number of women migrants aged between x and x+30.
Applying this to the data for the Western Cape suggest that the number of migrants born since the previous census should be less than half the numbers being estimated from the data on place of birth.
The indirect estimation of migration derives from the balance equation for two censuses n years apart, namely:
where
is the net (i.e. in less out) number of inmigrants, aged x to x+5 at the time of the first census, surviving to the second census, and _{5}D_{x},_{ 5}I’_{x} and_{ 5}O’_{x}, represent the number of deaths, surviving inmigrants and outmigrants, aged x to x+5 at the time of the first census, who died or moved in the period between the censuses.
For those born after the first census the equation becomes:
and those in the open age interval:
where B represents the number of births in the population between the two censuses, D_{B} the number of deaths of those births in the period between the censuses and M’_{B} the net number of surviving migrants, born outside the country in the period between the two censuses, _{∞}D_{An} the number of deaths in the intercensal period aged An and older at the time of the first census, and _{∞}M’_{An} the net number of migrants aged An and older at the time of the first census.
Thus
or alternatively
where _{5}S_{x} , S_{B} and _{∞}S_{An} represent the proportion of the populations aged x to x+5 at the time of the first census, born between the censuses, and aged An and older at the time of the first census, respectively, surviving to the second census.
The net number of migrants can thus be estimated from the net number surviving to the second census as follows:
Unfortunately, since the net number of migrants is usually small relative to the size of the population, age misstatement or errors in either or both census counts can lead to very poor estimates being produced. Better estimates of the net number of immigrants into a country can be produced by confining one’s attention to the population of foreigners (defined as those born outside the country) and assuming that return migration of emigrants from the country of interest is insignificant. Thus one replaces each of the symbols above by equivalents specific to the foreignborn population in the country. Since it is unlikely that one has an accurate record of the number of the foreignborn deaths these need to be estimated in one of the following ways:
$${\text{\hspace{0.17em}}}_{5}{S}_{x}=\frac{{}_{5}{N}_{x+n}^{nb}(t+n)}{{}_{5}{N}_{x}^{nb}(t)},\text{}{S}_{B}=\frac{{}_{n}{N}_{0}^{nb}}{{B}^{nb}}{\text{and}}_{\infty}{S}_{An}=\frac{{}_{\infty}{N}_{A}^{nb}(t+n)}{{}_{\infty}{N}_{An}^{nb}(t)}\text{\hspace{0.17em}},$$
where the superscript “nb”
designates nativeborn.
where the births and deaths are from the vital registration.
However, for most developing countries, particularly those in Africa, vital registration systems are too incomplete to be used in this way.
When it comes to internal migration one can estimate net inmigration (i.e. inmigration of those born outside the region less outmigration of those born outside the region who had previously moved into the region) into each subnational region of those born outside the region by making use of place of birth information to identify the change in numbers of those born outside the region, in the same way as described above. However, since one also has the place of residence of those born in the region who have moved out of the region since birth (but not emigrated) one can also estimate the net outmigration of those born in the region (i.e. outmigration of those born in the region less those born in the region who have returned after having previously moved out of the region) by applying the method described above to the population born in the region (as opposed to those born outside the region).
When estimating the survival of those born in the various regions the census survival ratios could have an advantage over the life table survival ratios in that any under or over count of the population by region, may well be matched by a similar distortion in the national population and hence in the survival ratios, thus resulting in a more accurate estimate of the number of migrants than would be produced by using life table survival ratios.
Apart from place of birth a census can ask of those who moved since the previous census (or some other suitable date) where they were at that census (or some other suitable date) which allows one to measure outmigration and hence (gross) inmigration separately for each subnational region.
If the census asks for the year when the migrant moved (or how long the person has been living in the place where counted in the second census) one can get a sense of the timing of migration, and estimate yearly migration rates. This is a complicated process and is not covered here, but the interested reader is referred to the paper by Dorrington and Moultrie (2009).
If agespecific numbers are not available or the allocation to age is considered to be unreliable one can still produce estimates by age by estimating the total number of migrants as described below, and then apportioning this total to the age groups using either an age distribution for the same population at a different time (since the age distribution of migration flows tend be consistent over time, or (more likely) an appropriate standard model Rogers and Castro (1981a; 1981b).
where
and _{∞}m_{0} is an estimate of the crude mortality rate of the population in the country of the census.
The primary limitation of using censuses to estimate immigration and net inmigration is the quality of the census, in particular the extent of undercount of the censuses, in general but more significantly one relative to the other. However, even if the census undercount is low, the census might not identify all the migrants. In general recent migrants are often difficult to include in a census because they have yet to settle. More specifically, immigrants may not be keen to identify themselves as immigrants and either avoid being counted or do not admit to being foreignborn.
Apart from this, place of birth and/or place of residence at previous census, in the case of internal migrants, might be misreported due to boundary changes or ignorance (or even bias) on the part of the respondent.
The third drawback of census data is that it cannot be used to measure emigration from the country of the census. Emigration is particularly difficult to estimate for most countries, but one option is to apply the method for identifying net immigration of the foreigners described above to the censuses of the main countries of destination to which the emigrants move to estimate the change in the numbers of emigrants to those countries. Of course, this is only useful if the censuses of these countries identify the numbers of foreignborn by their countries of birth reasonably accurately.
Generally, statistics on immigrants and particularly emigrants that are collected at border posts provide quite poor estimates of the true numbers, unless the borders of the country are quite impenetrable and there are a few wellcontrolled ports of entry. Even then there may still be many ‘visitors’ who end up living in the country.
A final drawback occurs when working with data aggregated over all ages. In these cases one usually has to make use of the crude death rate for the population of the country of the census in order to estimate the number of deaths of the migrant population. However, since the distribution of the migrant population by age can differ from that of the population of the country of the census quite markedly, the estimated number of deaths can be quite inaccurate.
Some censuses ask additional questions which can be of use in interpreting the patterns of migration, if not improving the estimate of the level of migration. Most common of these is probably a question asking about when the migrant moved. These data allow one to estimate annual rates of migration, however, it possible that there could be a tendency for respondents to report moves as occurring more recently than is actually the case (Dorrington and Moultrie 2009).
Where a census asks, such as the recent censuses in South Africa, of those who moved since the previous census, where they moved from most recently and when they moved, and not where they were at the time of the previous census, it is possible to backproject the numbers of migrants by applying annual rates of migration between subnational regions to estimate the number by place at the time of the previous census (Dorrington and Moultrie 2009). However, in the case of South Africa, at least, it appears that the assumption the most migrants moved only once in the past five years, and thus that the place of residence before the most recent move is the same as the place at the time of the previous census, is quite reasonable (Dorrington and Moultrie 2009).
Where one has data on both the subnational region of birth and the place at the time of the previous census, one can crosstabulate the place of residence data by the place of birth and thus be able to classify recent migrants into primary, secondary and return migrants.
For general background to the topic of migration, definition of terms and detail on the analysis and interpretation of the data on internal migration the interested reader is referred to the excellent UN Manual on topic, Manual VI (UN Population Division 1970). The textbook by Shryock and Siegel (1976) or its modern replacement by Siegel and Swanson (2004) also provides an introduction to the topic of migration and cover, in particular, the estimation of international migration.
Those interested in the estimation of annual migration rates and the backprojection of migration to estimate the numbers by place of residence at the time of the previous census from data on place of residence before the most recent move and year of move are referred to the paper by Dorrington and Moultrie (2009).
Dorrington RE and TA Moultrie. 2009. "Making use of the consistency of patterns to estimate agespecific rates of interprovincial migration in South Africa," Paper presented at Annual conference of the Population Association of America. Detroit, US, 30 April  2 May.
Rogers A and LJ Castro. 1981a. "Age patterns of migration: Causespecific profiles," in Rogers, A (ed). Advances in Multiregional Demography (RR81006). Laxenburg, Austria: International Institute for Applied Systems Analysis, pp. 125159. http://webarchive.iiasa.ac.at/Admin/PUB/Documents/RR81006.pdf [15]
Rogers A and LJ Castro. 1981b. Model Migration Schedules (RR81030). Laxenburg, Austria: International Institute for Applied Systems Analysis. http://webarchive.iiasa.ac.at/Admin/PUB/Documents/RR81030.pdf [16]
Shryock HS and JS Siegel. 1976. The Methods and Materials of Demography (Condensed Edition). San Diego: Academic Press.
Siegel JS and D Swanson. 2004. The Methods and Materials of Demography. Amsterdam: Elsevier.
Timæus IM. 2004. "Impact of HIV on mortality in Southern Africa: Evidence from demographic surveillance," Paper presented at Seminar of the IUSSP Committee "Emerging Health Threats" HIV, Resurgent Infections and Population Change in Africa. Ougadougou, 1214 February.
UN Population Division. 1970. Manual VI: Methods of Measuring Internal Migration. New York: United Nations, Department of Economic and Social Affairs, ST/SOA/Series A/47. http://www.un.org/esa/population/techcoop/IntMig/manual6/manual6.html [9]
This section describes how to fit a multiexponential model migration schedule to observed migration data.
Over the last thirty years, these schedules, devised by Rogers and Castro (1981), have been remarkably successful in representing typical age patterns of migration. Essentially the same age patterns of migration have been observed whether national and interregional migrations are considered simultaneously, or migration from a specific region is considered in isolation. The multiexponential function was designed to reflect the dependency between migration and age, and captures the relationship through an additive sequence of exponential curves, based on 7, 9, 11 or 13 parameters, depending on the complexity of the migration patterns and the ability and robustness of the data to sustain increased parameterization.
When fitted to a schedule of singleyearofage migration rates, the RogersCastro model provides a bestfit, graduated expression of the migration schedule that finds application in smoothing an observed series of migration rates, and which can be used directly to enhance understanding of migration dynamics. The results can also find application in a number of alternative uses, for example, in setting migration schedules to be used in multiregional population projections. Ideally, the analyst will have estimates of migration by single year and single ages to which the RogersCastro model can be fitted. However, if – as is often the case in developing countries where the quality of the underlying data may not permit such finely grained calculations – the data are only available in fiveyear age groups, then singleyear age rates need to be interpolated from the data using one of the methods described in this chapter before attempting to fit a RogersCastro model.
Ideally the data should be in the form of rates by single ages. Where they are in fiveyear age groups then single year observations must be interpolated from these fiveyear estimates before attempting to fit a multiexponential curve. The choice of the upper age is somewhat arbitrary, but the upper bound of the data used in fitting a model schedule should – at the minimum – be greater than the modal age of retirement.
Latest census counts the population by subnational region and place of birth accurately and identifies who have moved from one region to another since a prior date (e.g. previous census).
Before applying this method, you should investigate the quality of the data in at least the following dimensions:
Caution should be exercised in applying the method to net migration data, as the multiexponential distribution of migration rates by age models gross migration flows (i.e. in or outmigration) but not necessarily net migration, unless the flow in one direction significantly dominates the flow in the other at all ages.
The multiexponential function was designed by Rogers and Castro (1981) to reflect the dependency between migration and age. High levels usually found in the first year of life. It drops to a low point during the early teenage years. Then it increases sharply to its highest point during the young adult years. After that, it declines, except for a possible increase and subsequent decrease during the ages of retirement. In some circumstances there may be an upward slope at the oldest ages (Rogers and Castro 1981; Rogers and Watkins 1987).
Over the last thirty years, the schedule (also known as the RogersCastro model migration schedule) has proven to be remarkably successful in representing age patterns of migration (Little and Rogers 2007; Raymer and Rogers 2008; Rogers and Castro 1981, 1986; Rogers and Little 1994; Rogers, Little and Raymer 2010; Rogers and Raymer 1999; Rogers and Watkins 1987). These same age patterns of migration have been documented for regions of different sizes and for ethnic and gender subpopulations (Rogers and Castro 1981). They appear whether national interregional migrations are considered simultaneously, or migration from a specific region is considered separately. Directional migration (i.e. from region i to region j) exhibit the same patterns as well. For example, the RogersCastro model migration schedule has been fitted successfully to migration flows between local authorities in England (Bates and Bracken 1982, 1987), Canada’s metropolitan and non‑metropolitan areas (Liaw and Nagnur 1985), and the regions of Japan, Korea, and Thailand (Kawabe 1990), and South Africa’s and Poland’s national patterns (Hofmeyr 1988; Potrykowska 1988).
When fitted to a schedule of singleyearofage migration rates, the RogersCastro model provides a bestfit, graduated expression of the migration schedule that can be summarized by 7, 9, 11 or 13 parameters depending on the complexity of the schedule and strength of the data. In addition, the erratic fluctuations, often associated with unreliability in observed agespecific rates, are smoothed.
RogersCastro model migration schedules have been used in population projections in Canada (George 1994), and they have been imposed on time periods, regions, and subpopulations (Rogers, Little and Raymer 2010) when migration data were inadequate or unavailable.
The full model schedule has 13 parameters, which is the complete and most complex multiexponential form of the model. If M(x) is defined as the migration rate for a single year of age x, the full model is defined as
It comprises five additive components. The first component,
, is a single negative exponential curve representing the migration pattern of the prelabour force ages. The second component,
, is a leftskewed unimodal curve describing the age pattern of migration of people of working age. The third component,
, is an almost bellshaped curve representing the age pattern of migration postretirement, where migration increases sharply following retirement before falling off again. Associated with this component, the fourth component is a single positive exponential curve of the postretirement ages,
, reflecting the (sometimes) observed generalised increase in migration postretirement. This can be seen, for example, in the migration of the elderly in the US from the NorthEast to the “sunbelt” states in the South East and South West. The final component is a constant term, c, that represents ‘background’ migration.
Four families of multiexponential schedules have been identified in past studies (Rogers, Little and Raymer 2010), and only one, exhibiting both a retirement peak and a postretirement upslope, requires all 13 parameters and all five components. This family is documented in studies of elderly migration (Rogers and Watkins 1987), and is demonstrated in the bottom right panel of Figure 1.
Source: Based on Raymer and Rogers (2008)
Note: The legend indexes, in order, (1) the prelabour force migration schedule; (2) the working age migration schedule; (3) the postretirement migratory increase and decrease; and (4) the generalised increase in postretirement migration.
The other families are reduced forms of the full model, which means that at least one component is omitted. For example, the most common schedule identified by Rogers, Little and Raymer (2010) requires seven parameters and consists of the first two components and the constant term. This is also called the standard schedule, and its shape is set out in the top left panel of Figure 1.
A number of schedules have exhibited a standard profile plus a retirement peak (Rogers A and LJ Castro. 1981, 1986), resulting in the 11parameter model, including components 1, 2, 3 and 5, shown in the bottom left panel of Figure 1. In populations with significant migrant labour, particularly in the developing world, it is possible that the third component is a trough rather than a peak, as migrants return home to retire.
The 9parameter model is used when the standard pattern is visible for the labour and prelabour force ages, and there is an upslope to represent migration in the postretirement years as displayed in the top right panel of Figure 1. This was found in several regions of the Netherlands in 1974 by Rogers and Castro (1981).
As should be evident from the discussion above, all parameters are interpretable and can be used to characterize the model schedule.
In their original 11parameter specification of the multiexponential migration model, Rogers and Castro (1981) illustrated the model using data on male outmigration rates from Stockholm in 1974. Figure 2 shows the original data (the jagged lines) and the smoothed 11parameter schedule fitted to the original data.
Five of the 11 parameters (α_{1}, α_{2}, α_{3}, λ_{2} and λ_{3}) give rates of change for different pieces of the model schedule while the level parameters (a_{1}, a_{2}, a_{3} and c) correspond to the heights of the model schedule. a_{1} gives the peak in the first year of life, a_{2 }is the peak of labour force migration, a_{3} is the peak of retirement migration, and c gives the background migration rate. μ_{2} and μ_{3} give the ages at the labour force peak and at the retirement peak, respectively.
Source: Rogers and Castro (1981). Permission to reproduce this figure granted by the International Institute for Applied Systems Analysis (IIASA).Some measures can be used to describe either the observed or the model migration schedule. For example, x_{l} is the prelabour force age when migration is at its low point. x_{h} is the age when labour force migration peaks, and x_{r} is the age of peak retirement migration. The difference between x_{l} and x_{h} is called the ‘labour force shift’, X, and the increase in migration rate between x_{l} and x_{h} is called the ‘jump’, B. A, the ‘parental shift’, is used to describe the average age difference between parent migration and the corresponding migration of children. The gross ‘migraproduction’ rate (GMR) is the sum of all rates over all ages (i.e. the area under the curve), and it is used to gauge the total level of migration out of a region or the total directional migration, i.e., from region i to region j (Rogers and Castro 1981).
The method is applied in the following steps.
The initial step in estimating a model schedule is to prepare the data. Decisions about which measure of migration to use depend upon the data sources available (registry, census, or survey) and the purpose of the research. For example, in a comparative study of migration patterns, any of the measures would be appropriate as long as they are constructed similarly across contexts. If, on the other hand, the model schedules are to be used in singleyear population projections, the fitted schedule should represent singleage, singleyear migration rates. However, where one does not have singleyear singleage observations that produce progress relatively smoothly by age, then one must first convert the data one has into singleyear singleage estimates. A number of commonlyencountered situations are described below.
When the numbers of migrants who survived a fiveyear migration interval are available from census data which also give the year of most recent move, singleyear, singleage migration rates can be derived through a conceptually simple, yet algebraically complex, backprojecting procedure outlined by Dorrington and Moultrie (2009). Their method compensates for the effect of mortality by applying the mortality regime of the general population to the migrants and for the effect of interregional migration by applying the annual rates of migration for the most recent year to estimate the population by region one year prior to the census and using that to estimate the migration rates two years before the census, and using that to estimate the population two years before the census, etc. It requires additional regionofbirth information for those aged 04 at time of census, as well as singleage, yearly estimates of regional populations. Schedules derived in this manner can then be fitted and smoothed with a RogersCastro model schedule, and used in singleyear population projections.
Regardless of the migration time interval, whether using census data or population register data, fiveyear age groupings generally give more reliable estimates of migration propensities than oneyear age categories (Rogers, Little and Raymer 2010). In addition, counts of migrants in oneyear age categories are typically only available from sample data, since national population bureaus tend to publish counts of interregional migrants in fiveyear age categories.
To apply the multiexponential model when the initial migration proportions are in fiveyear age categories requires some method of converting the fiveyear rates to oneyear rates. Cubicspline interpolation (McNeil, Trussell and Turner 1977) is one such method that produces a smooth schedule for all integer values of ages. Rogers and Castro (1981) used data from Sweden, which was available in oneyear and fiveyear age rates, to test the accuracy of the cubicspline method, and found generally satisfactory results.
To arrive at smooth oneyear age migration profiles, the initial migration proportions for the fiveyear age categories are assigned values close to the middle age within the fiveyear interval, i.e., ages 2, 7, 12, 15, … 72, 77 (or 2.5, 7.5, 12.5, …, etc., if estimating rates rather than probabilities). From this set of points, a continuous age profile of state outmigration propensities is generated with cubicspline interpolation, which constructs thirdorder polynomials that pass through the set of predefined control points (called nodes). Commercial or freeware addins for Microsoft Excel, such as XlXtrFun [19], can also be used to implement cubic spline interpolation.
An alternative approach is to adapt Beers’ 6parameter interpolation procedure (Beers 1945) to interpolate rates between the rates for the youngest and oldest age groups, which also extrapolates the rates to ages 0 and 1 (or 0.5 and 1.5 if working with rates). The extrapolation to the youngest ages is achieved by assuming that the difference between propensities for age 1 and 2 is the same as that between ages 2 and 3, and that between ages 0 and 1 is the same as that between ages 3 and 4.
Thus, to apply either approach one needs a set of migration rates in fiveyear age intervals from 04, to at least 6569.
Once the observed schedule is prepared, a decision must be made about the form of the multiexponential model to be adopted. The overview of the multiexponential model migration schedule presented above described the characteristics of the 7, 9, 11, and 13parameter models. This decision should be informed by a visual inspection of the schedule, keeping in mind that the model is assumed to represent the true form of the population migration schedule. Sometimes, even after plotting the schedule, it is not apparent how best to model the retirement years and the oldest ages. For example, it may appear that either a standard 7parameter model or a 9parameter model (increasing migration in the oldest ages) would be appropriate. In this situation, the decision in favour of the 9parameter model could be based on a theoretical expectation for increasing migration in the later years. On the other hand, the 9parameter model form might be rejected, based on the goodnessoffit measures, as being insufficiently parsimonious if it produces no better fit than the 7parameter model. In deciding which form of the model to use, it is recommended that the goodnessoffit of the simpler model be compared with the more complex model, (e.g. comparing the fit of a 7parameter model versus that of an 11parameter model). As a general rule, and always bearing in mind the likely robustness of the underlying data, substantial improvement in fit is required to justify a more complex specification.
For most developing countries, particularly where ‘retirement’ isn’t concentrated between the ages of 60 and 65 and there is age exaggeration at the older ages, the data are probably not strong enough to fit anything more than the 7parameter version of the model.
Given the number of parameters (between 7 and 13) in the multiexponential model migration schedule, determining a best fit ab initio using trialanderror is not recommended. Instead, analytical algorithms have to be employed. The one described below uses an algorithm that is provided in Microsoft Excel, Solver. Solver may not be routinely loaded by standard installations of Microsoft Excel. To enable its use, proceed by selecting “File → Options → Addins → Manage Excel Addins → Go …” and then ensuring that the “Solver Addin” is ticked.
The specifications of the Solver function, and the conditions and constraints that should be adhered to, have been set up in the workbook associated with the methods presented in this chapter. To run the routine on a given worksheet, select “Data → Solver → Solve”.
The model is fitted in the associated workbook and is set up to allow the user to set the “objective” to be minimized to be either the sum of squared differences between the observed rates and the fitted rates, or the chisquared statistic.
The default Solver is set up to fit using all parameters. If one wants to fit a curve using only some of the parameters then one must specify only these parameters in the “By Changing Variable Cells” window, and set the other parameters to appropriate constant values (which may, or may not, be zero depending on the requirements of the fitting procedure). An instance where such constrained optimisation may be required is mentioned below.
The sum of squared differences is calculated as follows:
where O_{i} represents the observed rate at age i, F_{i} represents the fitted value at age i and n the number of age groups.
The chisquared statistic is calculated as follows:
The chisquared statistic is more sensitive to misfitting to age ranges where rates are lower (resulting in a proportionately larger error) and thus is a better metric to assess goodnessoffit when trying to fit the ‘retirement hump’ (the third component).
The choice of initial parameter values is the principal difficulty in nonlinear parameter estimation. Ideally, given a set of starting values, the algorithm proceeds through an iterative process, producing a revised set of “optimum” values. However, the optimum may be merely a local optimum, and not the global optimum. A better guess of the initial parameter values may produce an improved goodnessoffit and produce a different set of final values. A poorer choice of initial parameters may prevent convergence to even a local optimum.
Bearing this in mind, the most effective method of ensuring that the results from a fitting procedure are indeed globally “optimal” is to choose parameter values previously reported for a “similar” curve. To this end one might start with the values already in the workbook which were used to fit the curves in the examples below.
Convergence may be more difficult to achieve with the 11 and 13parameter models. Where such heavily parameterised models are justified, one approach that can be adopted is to first fit a standard 7parameter model to the data (thereby securing the fit at the peak of the migration schedule, and at ages up to midadulthood). Then, one could proceed by fixing those 7 parameters to their estimates that resulted from the initial step (i.e. treat those parameters as constant from there on), and then estimate the remaining parameters. Another effective procedure is to carry out a linear estimation method first, which does not rely on an iterative algorithm. That method was first described in Rogers and Castro (1981) and later included as one of the several alternatives set out in Rogers, Castro and Lea (2005).
Another challenge in finding the optimum solution lies in choosing an appropriate stopping criterion for the iterative algorithm. As the iteration process converges on a solution, the chisquare statistic, which measures the differences between the observed and the estimated values, decreases. An indication that an acceptable solution has been found is when the chisquare value decreases by only a negligible amount from one iteration to the next. The level of this small difference is called the “tolerance” and is set by the user. The temptation is to set it to be a very small value, i.e. very close to zero, so that a true minimum chisquare value is achieved. However, the risk in this approach is that such a low tolerance may not be achievable, even when a solution has been found. Press, Flannery, Teukolsky et al. (1986) suggest a tolerance equal to .001 is a reasonable setting. If the estimation software fails to converge, the convergence criteria could be made less stringent, i.e. increase the tolerance, or try new initial estimates.
One trialanderror method of choosing initial estimates makes use of the graphs in the accompanying Excel workbook. By substituting your schedule of observed data in one of the sheets, initial “guesses” of each parameter can be chosen and placed in the cells where the final estimates of each parameter are located. Then, by visual examination of the fit, and identification of the parameter values that are most out of line, try new initial values for those parameters and then reevaluate the fit visually. Continue this way until the fitted schedule is reasonably close to the observed schedule. At this point, you will know you have reasonably good initial estimates and may proceed to the nonlinear least squares estimation procedure.
We evaluate the model fit by calculating the mean absolute percent error (MAPE) statistic:
The MAPE is prone to overstate inaccuracy, particularly when the observed schedule has many values that are very close to zero (Morrison, Bryan and Swanson 2004).
In addition to MAPE, we also calculate R^{2}, the square of the correlation between the O_{i} and the F_{i} values. A heuristic that is often employed is that a reasonable fit is achieved with a MAPE of 15 per cent or less together with an R^{2} well above 90 per cent.
In addition, since the method assumes the estimated RogersCastro model schedule represents the true form of the migration schedule, the estimated model schedule should appear to represent the underlying pattern of the observed data.
If the goal is to describe the pattern of migration and a multiexponential model has been successfully fitted to the data, any of the summary measures (e.g. GMR, X, B, and A) as well as the parameter estimates can be used to describe the schedule. The summary measures and the parameter interpretations are given in the Overview.
In the examples below, multiexponential model migration schedules are applied to a variety of data, of varying quality and complexity and from a number of different sources. All worked examples are provided in the associated workbook on the Tools for Demographic Estimation website.
Because iterative methods are required to fit a model life table to data on conditional survivorship in adulthood, detailed worked examples are not provided in the text. The reader is directed to the description provided in the previous section on how to use Solver in Microsoft Excel to determine optimal fits. The workbook is set up to use Solver to derive the results presented.
An example of a schedule based on oneyear age migration propensities measured over a oneyear migration interval from census data is shown in Figure 3. The data are derived from the 2005 American Community Survey (ACS), a national survey conducted annually by the U.S. Census Bureau. Even for California, a highly populated state, the oneyear age propensities over a oneyear interval are quite unstable. The MAPE is 17 per cent and the R^{2} is 0.92.
Caution must be exercised when using oneyear age propensities over oneyear migration intervals. For each single age, the numbers at risk of migrating, as well as the numbers of migrants, may be small, resulting in propensities that are erratic and unstable. A better option may be to derive fiveyear age propensities, which have proven to be more reliable than oneyear age propensities (Rogers, Little and Raymer 2010). These can be interpolated to yield oneyear age propensities using cubic splines or Beers’ formula as discussed in the section describing the application of the method.
Figure 4 shows an example using census data for the state of New Hampshire. The US Census Bureau’s 1 per cent Public Use Microdata Sample (PUMS) is a relatively small sample taken from the census and New Hampshire is one of the least populated states. The oneyear age propensities appear to be quite unstable with dramatic fluctuations, while the model schedule provides a smooth estimate of the true schedule form. The MAPE is 52 per cent and the R^{2} is 0.68.
Figure 5 shows the cubicspline interpolation method applied to the fiveyear age migration propensities for New Hampshire, derived from the 2000 Census 1 per cent public use microsample data. The schedule interpolated from the fiveyear age rates is much smoother and provides more reliable estimates than the observed oneyear age rates displayed in Figure 4, and thus is a better set of estimates against which to compare the fitted multiexponential curve. The MAPE was reduced from 52% for the oneyear age propensities to 15 per cent for the rates interpolated from the fiveyear age proportions, and the R^{2} increased from 0.68 to 0.94.
There are several reasons why the levels of the New Hampshire schedule, in Figure 5, are substantially higher than the California schedule, Figure 4. The California example gives migration over a oneyear migration interval and the New Hampshire schedule is over a fiveyear interval. In addition, New Hampshire is a much smaller areal region than California and the expectation is that the force of migration will be more powerful in a geographically smaller region.
It is important to check visually if the agespecific migration rates have a ‘shape’ that is compatible with the RogersCastro models. If this is not the case then it is unlikely that these models will provide a satisfactory fit. Likewise, it is worthwhile checking whether there are any extreme values, particularly at older ages which might distort the choice of parameters or even the choice of the number of parameters to be fitted. If the observed estimates are particularly noisy, it would be better to group the data into fiveyear age intervals and then estimate a smoothed distribution using either the Beers 6parameter interpolation provided or Spline curve fitting.
The formulation of the multiexponential model was presented in the Overview and is not repeated here. In this section, we discuss in greater detail aspects that should be considered carefully before applying the method in practice.
The multiexponential model is applied to schedules of oneyear age migration rates beginning at age 0 and, typically, continuing to age 65 or higher to capture the full pattern of elderly migration. The schedules of agespecific migration might measure directional migration (i.e. from region i to region j) or total outmigration (i.e. from region i to all other regions), or all interregional migration with no specific origin or destination. Usually, migration data are obtained from national censuses (or, in developed countries, population registers). The multiexponential model can be applied to a variety of measures of singleage migration propensities derived from either of these sources.
When obtained from national registration systems, the migration rate, for persons aged x at the beginning of the interval, is the ratio of the number of migrations during a given time interval divided by the average number of personyears exposed to the risk of moving. Persons can contribute more than one migration during the interval. These are occurrenceexposure rates, although migrations by nonsurvivors may not be included in the numerator (Rogers and Castro 1981).
The observed migration schedule in Figure 2 was derived from Sweden’s national registry for male migration out of Stockholm over a oneyear interval. In contrast, Figure 6 shows the observed and estimated model schedule for all male intercommunal migration in Sweden over a fiveyear interval. As expected, the levels are much higher in Figure 6 due to more migration activity when all regions are combined as compared to the Stockholm region alone. Similarly, more migrations are expected over a fiveyear interval than over a oneyear interval. Rees (1977) found migration rates over a fiveyear interval tend to be less than five times (between 3 and 5 times) those over a oneyear interval. The observed schedule is also smoother and more similar to the model schedule in Figure 6, indicating singleage migration rates are more reliable when based on a longer interval.
Censuses, on the other hand, count surviving migrants (not migrations). Migrants are persons who reported living in one region, at the beginning of the time interval, and resided in a different region at the time of the census. A person registering multiple migrations in a national register may be a nonmigrant in the census if he returned to his initial location during the time interval. In general, counts of migrants from censuses understate the number of migrations, especially for longer time intervals when there are bound to be larger numbers of return movers and nonsurvivors. For these reasons, a migration schedule derived from population register data is not directly comparable to one based on census data (Rogers and Castro 1986).
Censuses typically record the location of a person’s current residence and ask where the person was living either one year ago or five years ago. Given this information and the person’s age at the time of census, the numbers of surviving migrants, and the numbers of survivors who were at risk of migrating are counted. The ratio of the number of surviving migrants to the number of survivors at risk for migrating is sometimes called a ‘conditional survivorship proportion’ because migrants and persons counted as being at risk for migrating must have survived the migration time interval to be counted by the census (Rogers, Little and Raymer 2010). Since these are not occurrenceexposure rates they will be called migration propensity here.
To derive singleage migration propensities when the census question asks where a person was living one year ago, all persons are “backcast” to the region where they lived one year earlier when they were one year younger, which gives the number of persons at risk of migrating from that region. For example, a person aged 1 last birthday in a census conducted in 2010 would have been aged 0 last birthday in 2009. If the 2010 age values ranged from 1 to 85, they would range from 0 to 84 in 2009. (Note, only persons aged 1 and older would have reported place of residence 1 year ago.) Backcasting yields the number of people who survived to be counted by the census in 2010 and who were at risk for migrating from region i, in 2009. The number of migrants would be the count of persons who reported living in region i in 2009, but were counted as residing in a different region in 2010. For each 1year age group, the ratio of the number of migrants to the number at risk for migrating gives the agespecific outmigration propensity for the 1year interval. When the numerator contains directional migrants, i.e. from region i to region j, the ratio gives the agespecific propensity to migrate from region i to region j.
Caution must be exercised when using oneyear age propensities over oneyear migration intervals. For each single age, the numbers at risk of migrating, as well as the numbers of migrants, may be small, resulting in propensities that are erratic and unstable. A better option may be to derive fiveyear age propensities, which have proven to be more reliable than oneyear age propensities (Rogers, Little and Raymer 2010). These can be interpolated to yield oneyear age propensities.
When the census question asks where a person was living five years ago, it is possible to derive oneyear age propensities for migrating over a fiveyear interval as long as single ages are reported. It is done by backcasting all persons to the region where they lived five years earlier when they were five years younger. Persons aged 5 last birthday in a census conducted in 2000, for example, would have been aged 0 last birthday in 1995. If the age values ranged from 5 to 85 in 2000, they would range from 0 to 80 in 1995. The number of migrants is simply the count of persons who reported living in region i in 1995, but were counted as residing in a different region in 2000. For each oneyear age group, the ratio of the number of migrants to the number at risk for migrating gives the agespecific outmigration propensity over the fiveyear interval.
When the numbers of migrants who survived a fiveyear migration interval are available from census data, singleyear, singleage migration rates can be derived through a backprojecting procedure outlined by Dorrington and Moultrie (2009). Their method compensates for the effect of mortality by applying the mortality regime of the general population to the migrants and for the effect of onward migration by applying the annual rates of migration for the most recent year to estimate the population by region one year prior to the census and using that to estimate the migration rates two years before the census, and using that to estimate the population two years before the census, etc. It requires additional regionofbirth information for those aged 04 at time of census, as well as singleage, yearly estimates of regional populations. Schedules derived in this manner can then be fitted and smoothed with a RogersCastro model schedule, and used in singleyear population projections.
Unless one has accurate and wellbehaved data the multiexponential model will not produce a very close fit and thus can be overparameterised – i.e. many different sets of parameters can produce virtually equally good fits to the observed values. In such a situation it might help to fix one or two parameter values and fit the rest, and parsimony with the number of parameters is recommended.
Application of the multiexponential model is not limited to schedules of migration rates or propensities. Several studies have established that age distributions of migrants (and migrations if using registration data) often have a multiexponential form and can be accurately represented by a RogersCastro model schedule (Little and Rogers 2007; Rogers, Little and Raymer 2010).
The singleage numbers of migrants/migrations can be derived using any of the data sources and methods described above, because these are simply the numerators in the migration propensity and rate calculations. The observed data fitted by the model schedules are the singleage proportions of the total migrants/migrations. Note, if the numbers of migrants are reported in fiveyear age categories, some form of interpolation would be necessary. If cubic spline interpolation is used, the numbers associated with each node should be the migrants/migrations for each fiveyear age grouping divided by five.
For example, the observed age composition of Swedish migrations as a proportion is illustrated in Figure 7. From this it appears to be very smooth and reliable except in the oldest ages. A 7parameter model schedule fits pretty closely, with an R^{2} of 99 per cent and MAPE of 29 per cent. However, this is an example of the how the MAPE can exaggerate the model’s lack of fit, as it becomes inflated when there is a sequence of small observed deviations.
Two alternative software options for fitting to the Excel workbook for fitting the multiexponential curve are 1) Data Master 2003 [25], a free curvefitting program, which applies the Levenberg–Marquardt algorithm; and 2) R [26] (R Development Core Team 2012) which is also free, but is a software environment for allpurpose statistical computing and graphics and as such requires a significant time investment before it can be used with confidence. The Appendix to this chapter on the Tools for Demographic Estimation website gives very basic commands for defining Rfunctions that produce estimates for the 7parameter and the 11parameter models using the GaussNewton algorithm.
Bates J and I Bracken. 1982. "Estimation of migration profiles in England and Wales", Environment and Planning A 14(7):889900. doi: http://dx.doi.org/10.1068/a140889 [27]
Bates J and I Bracken. 1987. "Migration age profiles for localauthority areas in England, 19711981", Environment and Planning A 19(4):521535. doi: http://dx.doi.org/10.1068/a190521 [28]
Beers H. 1945. "Sixterm formulas for routine actuarial interpolation", The Record of the American Institute of Actuaries 33(2):245260.
Dorrington R and TA Moultrie. 2009. "Making use of the consistency of patterns to estimate agespecific rates of interprovincial migration in South Africa," Paper presented at Annual Meeting of the Population Association of America. Detroit, Michigan, 29 April  2 May 2009.
George MV. 1994. Population projections for Canada, provinces and territories, 19932016. Ottawa: Statistics Canada, Demography Division, Population Projections Section.
Hofmeyr BE. 1988. "Application of a mathematical model to South African migration data, 1975–1980", Southern African Journal of Demography 2(1):24–28.
Kawabe H. 1990. Migration rates by age group and migration patterns: Application of Rogers' migration schedule model to Japan, The Republic of Korea, and Thailand. Tokyo: Institute of Developing Economies.
Liaw KL and DN Nagnur. 1985. "Characterization of metropolitan and nonmetropolitan outmigration schedules of the Canadian population system, 19711976", Canadian Studies in Population 12(1):81102.
Little JS and A Rogers. 2007. "What can the age composition of a population tell us about the age composition of its outmigrants?", Population, Space and Place 13(1):2319. doi: http://dx.doi.org/10.1002/psp.440 [4]
McNeil DR, TJ Trussell and JC Turner. 1977. "Spline interpolation of demographic data", Demography 14(2):245252. doi: http://dx.doi.org/10.2307/2060581 [29]
Morrison PA, TM Bryan and DA Swanson. 2004. "Internal migration and shortdistance mobility," in Siegel, JS and DA Swanson (eds). The Methods and Materials of Demography. San Diego: Elsevier pp. 493521.
Potrykowska A. 1988. "Age patterns and model migration schedules in Poland", Geographia Polonica 54:6380.
Press WH, BP Flannery, SA Teukolsky and WT Vetterling. 1986. Numerical Recipes: The Art of Scientific Computing. Cambridge: Cambridge University Press.
R Development Core Team. 2012. R: A language and environment for statistical computing: Reference Index. Vienna, Austria: R Foundation for Statistical Computing. http://www.mendeley.com/research/rlanguageenvironmentstatisticalcomputing13/ [30]
Raymer J and A Rogers. 2008. "Applying model migration schedules to represent agespecific migration flows," in Raymer, J and F Willekens (eds). International Migration in Europe: Data, Models and Estimates. Chichester: Wiley, pp. 175192.
Rees PH. 1977. "The measurement of migration, from census data and other sources", Environment and Planning A 9(3):247272. doi: http://dx.doi.org/10.1068/a090247 [31]
Rogers A and LJ Castro. 1981. Model Migration Schedules. Laxenburg, Austria: International Institute for Applied Systems Analysis. http://webarchive.iiasa.ac.at/Admin/PUB/Documents/RR81030.pdf [16]
Rogers A and LJ Castro. 1986. "Migration," in Rogers, A and F Willekens (eds). Migration and Settlement: A Multiregional Comparative Study. Dordrecht: D. Reidel, pp. 157208.
Rogers A, LJ Castro and M Lea. 2005. "Model migration schedules: Three alternative linear parameter estimation methods", Mathematical Population Studies 12(1):1738. doi: http://dx.doi.org/10.1080/08898480590902145 [32]
Rogers A and JS Little. 1994. "Parameterizing age patterns of demographic rates with the multiexponential model schedule", Mathematical Population Studies 4(3):175195. doi: http://dx.doi.org/10.1080/08898489409525372 [33]
Rogers A, JS Little and J Raymer. 2010. The Indirect Estimation of Migration: Methods for Dealing with Irregular, Inadequate, and Missing Data. Dordrecht: Springer.
Rogers A and J Raymer. 1999. "Estimating the regional migration patterns of the foreignborn population in the United States: 19501990", Mathematical Population Studies 7(3):181216. doi: http://dx.doi.org/10.1080/08898489909525457 [34]
Rogers A and J Watkins. 1987. "General versus elderly interstate migration and population redistribution in the United States", Research on Aging 9(4):483529. doi: http://dx.doi.org/10.1177/0164027587094002 [35]
The loglinear modelling framework provides several valuable techniques for studying and estimating migration flows within a network of regions. To date, these methods have been applied most often to internal migration systems where regions are defined as subnational administrative units. However, they need not be restricted to domestic migration and may be applied to international systems of migration as well (Raymer 2007).
A migration flow is defined as the number of migrations from one region to another over the course of a specified time frame. There are several different ways to count migrations and each one could yield a different result. For example, Rees and Willekens (1986) make the distinction between registration systems that count the number of interregional residential moves over a reference period and censuses that count persons who reside in a place at the time of the census that is different from the place of residence at the beginning of the reference period.
Regardless of the method used to count migration flows, it is conventional to present them in contingency tables. These are square tables that report the flow counts between origin and destination regions. The flows in the migration table can be perfectly reproduced by the multiplicative component model, which is a saturated (i.e., where there are as many estimated parameters as there are data points) loglinear model. It has been used by Willekens (1983), Rogers, Willekens, Little et al. (2002)) and Rogers, Little and Raymer (2010)) to represent the matrix of flows between regions, and by Raymer and Rogers (2007), Raymer, Bonaguidi and Valentini (2006)) and Rogers, Little and Raymer (2010)) to capture the structure of interregional flows within age categories. The multiplicative components are interpretable and conveniently used to define the structure of migration between the regions of interest (Rogers, Willekens, Little et al. 2002). If calculated for more than one set of interregional flows, defined for different time periods, for example, or for different age, sex or race categories, multiplicative components are useful for comparing migration regimes across these populations.
Loglinear methods may be used to justify simplified representations of migration structure that are more parsimonious than the saturated model. The appropriateness of a reduced model is determined by fitting the predicted flows to the observed flows and by using statistical methods to evaluate the goodness of fit. If the reduced form has merit, i.e., fits the data well, the model may be used to estimate indirectly the flows. The independence model, for example, assumes interregional flows are distributed according to the pattern that could have been predicted based on the marginal distributions of flows across origin and destination regions. If the independence model is confirmed, interregional flows are predictable and can be estimated indirectly, but accurately, if the total sending and receiving flows of each region are given.
Sometimes the structure of migration is hypothesized to be invariant with respect to factors such as time, age, sex, and race. These hypotheses can be represented and tested with loglinear models. Allowing for changes in the level of migration, studies have documented remarkable stability in migration structures, in particular the rates of migration by age, over time (Mueser 1989; Nair 1985; Snickars and Weibull 1977). Other studies have shown consistency in the age patterns of interregional migration over time (Raymer and Rogers 2007). Moreover, the migration structure of the youngest ages, which can be inferred from birthplacespecific population stocks, has, in certain contexts, proven to be a “proxy” for the level of migration and allowed the estimation of migration of the older age groups (Raymer and Rogers 2007; Rogers, Little and Raymer 2010).
These studies have set the stage for establishing the method of offsets as a successful tool for indirectly estimating migration flows. It is a special application of loglinear modelling that forces a known migration structure on to a system that may have missing or unreliable interregional flow data. Using this method, the known migration structure of one time period can be borrowed from another period. In addition, when flows are disaggregated by age, the structure of agespecific interregional flows of one time period can be applied to another period. Furthermore Raymer and Rogers (2007) showed that the level of infant lifetime migration can be applied, using the method of offsets, to estimate indirectly the migration flows of the older ages.
Applications of loglinear models, and the related assumptions, are detailed in the sections that follow, beginning with the twovariable case, i.e., origin and destination. In this section, the loglinear model is defined in the context of twodimensional flow tables, and multiplicative forms as well as additive forms of the saturated model are derived and interpreted. The loglinear model of independence and the “migrants only” quasiindependence model are set out, including illustrations and a brief description of the methods for evaluating goodnessoffit.
The section concludes with an illustration of the method of offsets for indirectly estimating the interregional flow data of one period based on the migration flow patterns of another. When flow data are available for two periods, the periodinvariance assumption can be tested with a loglinear model and the method of offsets. Models that disaggregate the origin and destination of flows into age categories are considered. This is followed by an illustration of how the multiplicative model with age can be applied, using the method of offsets, to estimate indirectly the agespecific interregional flows for another period.
To illustrate the twovariable loglinear model, consider the 1973 and 1976 migrations in the Netherlands between types of municipalities categorized into six different groups based on degree of urbanization. These were published by Willekens (1983)) and are presented in Table 1. In this context, there are two variables, region of origin (O) and region of destination (D). Neither is identified as the dependent variable. The outcome variable may be either the interregional migration flow, denoted n_{ij}, in the multiplicative form of the model, or the natural logarithm of the flow, denoted ln(n_{ij}), in the additive form of the model.
Decompositions of the saturated model, each one perfectly regenerating the observed data, are described in the subsections presenting the multiplicative component model and the additive linear model, and three indirect estimation techniques are illustrated in the three subsections describing the independence model, the quasiindependence model and the method of offsets subsections that follow.
Table 1 Migration between municipalities by degree of urbanization,* the Netherlands, 1973 and 1976
A. 1973 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
50,498 
23,829 
8,566 
21,846 
16,264 
18,856 
139,859 

2 
25,005 
27,536 
6,953 
14,326 
16,212 
18,282 
108,314 

3 
15,675 
10,710 
13,874 
6,266 
9,819 
19,701 
76,045 

4 
23,457 
14,169 
4,431 
10,209 
9,386 
10,973 
72,625 

5 
29,548 
25,267 
11,802 
13,160 
15,979 
20,406 
116,162 

6 
46,815 
39,123 
42,399 
25,012 
26,830 
23,304 
203,483 

Total 
190,998 
140,634 
88,025 
90,819 
94,490 
111,522 
716,488 

B. 1976 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
14,473 
14,327 
6,077 
11,689 
10,618 
9,897 
67,081 

2 
14,833 
36,258 
13,289 
17,391 
20,899 
21,869 
124,539 

3 
8,330 
17,764 
25,113 
10,489 
18,171 
29,220 
109,087 

4 
11,315 
16,498 
8,935 
10,537 
10,762 
12,519 
70,566 

5 
11,875 
24,370 
19,151 
12,312 
16,724 
22,591 
107,023 

6 
16,582 
32,336 
52,415 
22,264 
28,182 
27,810 
179,589 

Total 
77,408 
141,553 
124,980 
84,682 
105,356 
123,906 
657,885 

*1: rural municipalities 



2: industrial rural municipalities 



3: specific resident municipalities of commuters 

4: rural towns and small towns 

5. mediumsized towns 

6. large towns of more than 100,000 inhabitants 

Source: Central Bureau of Statistics, The Hague 
The multiplicative expression of the saturated loglinear model, called the multiplicative component model, reproduces the elements of the flow table as follows:
Like all saturated models, it is, strictly speaking, not a model but a way of representing the data. n_{ij} is the observed flow of migration from region i to region j, and the effect parameters are T, O_{i}, D_{j}, OD_{ij}. Therefore, any i to j flow found in the interior 6 by 6 submatrices of Table 1 can be expressed by an equation of the same form as Equation 1 with the corresponding set of parameters. T gives the overall effect, O_{i} gives the effect of origin i, D_{j} gives the effect of destination j, and OD_{ij} gives the effect of the association between O_{i }and D_{j}. Taken together, the parameters of the saturated model represent the spatial structure of migration (Rogers, Willekens, Little et al. 2002).
Two different sets of parameters that satisfy the multiplicative component model have been used in migration studies and both are presented here. Each one offers a different way of representing and interpreting the migration structure. The first is called geometric mean effect coding (Knoke and Burke 1980; Willekens 1983) and the second is called total sum reference coding (Raymer and Rogers 2007; Rogers, Little and Raymer 2010). A third multiplicative component model is derived in the subsection presenting the loglinear additive model.
Geometric mean effect coding was the first decomposition of Equation 1 used for migration analysis. It was proposed by Birch (1963) and is formally equivalent to the gravity model of migration (Willekens 1983). Table 2 shows the multiplicative components resulting from geometric mean effect coding of the Netherlands data from Table 1. Note that the overall component (T) is set out in the grand total locations of the table, the origin components (O_{i}) are set out in the rowtotal locations, the destination components (D_{j}) are set out in the columntotal locations, and the origindestination interaction components (OD_{ij}) are set out in the cells of the interior submatrices.
Table 2 Multiplicative components using geometric mean effect coding
A. 1973 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.457 
0.940 
0.656 
1.352 
0.933 
0.882 
1.180 

2 
0.885 
1.332 
0.653 
1.087 
1.140 
1.048 
0.962 

3 
0.771 
0.720 
1.811 
0.661 
0.959 
1.570 
0.692 

4 
1.275 
1.052 
0.639 
1.190 
1.014 
0.966 
0.627 

5 
0.943 
1.102 
1.000 
0.901 
1.013 
1.055 
1.067 

6 
0.838 
0.957 
2.015 
0.960 
0.954 
0.676 
1.903 

Total 
1.711 
1.252 
0.644 
0.798 
0.861 
1.056 
17,168.003 

B. 1976 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.753 
0.984 
0.571 
1.317 
0.979 
0.787 
0.656 

2 
0.986 
1.366 
0.686 
1.075 
1.057 
0.954 
1.195 

3 
0.655 
0.792 
1.533 
0.767 
1.088 
1.508 
1.010 

4 
1.277 
1.055 
0.783 
1.106 
0.925 
0.927 
0.704 

5 
0.900 
1.047 
1.127 
0.868 
0.965 
1.124 
1.048 

6 
0.769 
0.850 
1.888 
0.960 
0.995 
0.847 
1.712 

Total 
0.768 
1.354 
0.989 
0.825 
1.008 
1.169 
16,401.919 
The overall effect, T, is described as the constant of proportionality or the size main effect (Willekens 1983). It is the geometric mean of all interregional flow values:
where m is the number of origin regions (rows) = the number of destination regions (columns). T equals 17,168.003 for 1973 and 16,401.919 for 1976.
For a particular region i, the main effect of that region of origin is the ratio of the geometric mean of flows originating from i divided by the overall geometric mean:
The main effect, O_{i}, shows the relative importance of region i as a source of migrations (Alonso 1986). For example, based on the 1973 data, the effect of originating in Category 4 is equal to:
This is the smallest of the origin (row) effects, which suggests that Category 4 was the least important source of migrations in 1973.
Similarly, the destination main effect, D_{j}, gives the relative importance of region j as an attractor of migrants. It is ratio of the geometric mean of column j to the total geometric mean and the formula is:
For example, for municipalities in Category 4, the destination effect in 1973 is equal to:
All other row and column effects can be derived in the same way. Each is the geometric mean of the row (or column) elements divided by the overall geometric mean, and they are equivalent to the balancing factors in the gravity model (Willekens 1983).
They can be compared across regions and across time periods. For example, Category 6 was the most important source of migrations in 1973 (1.903 is greater than the other destination effects), and in 1976 (1.712 is greater than the other destination effects). Category 1 was less important as a destination in 1976 than in 1973 (0.768 is less than 1.711), and, in 1973, it was less important as a source of migrations than as a destination for migrations (1.180 is less than 1.711).
Panels A and B in Table 2 are sometimes called the spatial interaction matrices. The elements are the OD_{ij} interaction effects in Equation 1 and each one is equal to the observed flow between i and j divided by the expected flow, which is the product of the other three parameters. The formula is:
Each OD_{ij} expresses the departure of the observed flow, n_{ij}, from the expected flow based on the assumption of no association between the destination region j and the origin region i, i.e., (T)(O_{i})(D_{j}). They have been interpreted as indicators of the accessibility, the ease of interaction, or the attractiveness between two regions (Rogers, Willekens, Little et al. 2002).
Values equal to 1.0 indicate independence, i.e., no association between the origin and the destination. As implied by Equation 1, if an OD_{ij} parameter is equal to 1.0, n_{ij} is determined by the values of T, O_{i} and D_{j} alone. A departure from 1.0 in either direction is an indication of an association between the destination and the origin. Values greater than 1.0 indicate higher than expected levels of accessibility/attractiveness and values less than 1.0 indicate less than expected accessibility/attractiveness.
Since the 1973 diagonal effects are generally greater than 1.0, it appears migrants were unexpectedly attracted to destinations in the same category of municipality. Category 6 was an exception. Migrants from large towns of more than 100,000 inhabitants (i.e., Category 6) were more attracted to commuter municipalities (i.e., Category 3) than to other large towns (2.015 is greater than 0.676).
Table 2 shows all the parameters necessary for reproducing the 1973 and 1976 flows. To verify that any flow in Table 1 can be reproduced by the multiplicative components, take, for example, the 1973 flow from Category 2 to Category 3:
n_{2,3} =6953=17168.003×0.962×0.644×0.653 .
The parameter values, however, are not all independent of each other. In other words, some parameter values can be derived from the others. For one year of data, for all i and j combinations, there are 36 interaction effects, 6 origin main effects, 6 destination main effects, and one overall effect as reported in Table 2. However, the 49 parameters, reported for each year in Table 2, were derived from only 36 observed flows, making 13 more parameters than original data points, implying that 13 parameters must be redundant. In other words, 13 of the 49 parameters can be calculated from the other 36, and the relationship between parameters is determined by the following constraints associated with geometric mean effect coding. The first set of constraints forces the products of the origin main effects (and destination effects) to be equal to 1. This is expressed as
The second set of constraints is imposed on the interaction elements of each row and column, making the products of the interior elements in each row (and column) equal to 1. In other words, if five of the interaction effects associated with a particular origin (or destination) are given, the sixth interaction effect would be implied.
This is expressed as
In general, if there are m regions there are m^{2} linearly independent parameters and 1+m+m+(m×m) multiplicative components. For all of the geometric mean effect coding computations, see Table 2 in the Multiplicative Components sheet of the accompanying workbook.
Geometric mean effect coding, which uses the geometric mean as the reference value, was the earliest loglinear decomposition used to describe migration (Rogers, Willekens, Little et al. 2002; Willekens 1983). Recently, however, total sum reference coding has become more standard (Raymer and Rogers 2007; Rogers, Little and Raymer 2010). While both decompositions satisfy Equation 1, the effects under total sum reference coding are more transparent. For example, the main effect, T, is now the total number of migrants, denoted n_{++}. O_{i} is now the proportion of all migrants leaving from region i (i.e., n_{i+}/n_{++}), and D_{j} is the proportion of all migrants moving to region j (i.e., n_{+j}/n_{++}). The interaction component OD_{ij} is now defined as n_{ij}/[(T)(O_{i})(D_{j})] or the ratio of the observed number of migrants, n_{ij}, to the expected number, (T)(O_{i})(D_{j}). All effects taken together provide another way to represent the spatial structure of migration.
The multiplicative components derived from total sum reference coding are set out in Table 3. Consider, for example, the 8566 migrations from Category 1 to Category 3 in 1973 disaggregated into the four multiplicative components:
The interpretations of these components are relatively straightforward. The overall component is the reported total number of migrations in 1973, i.e., 716,488. The origin component represents the share of all migrants from each region, i.e., 10 per cent of all migrations originated in the Category 1. The destination component represents the shares of all migrations to each region, i.e., 19 per cent of all migrations had Category 3 as the destination. Finally, the interaction component represents the ratio of observed migrants to expected migrants, and there were roughly 48 observed migrations between region 1 and 3 for every 100 expected. The expected flow is based on the marginal total information, i.e., (T)(O_{1})(D_{3}).
Table 3 Multiplicative components using total sum reference coding
A. 1973 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.354 
0.868 
0.499 
1.232 
0.882 
0.866 
0.195 

2 
0.866 
1.295 
0.523 
1.043 
1.135 
1.084 
0.151 

3 
0.773 
0.718 
1.485 
0.650 
0.979 
1.664 
0.106 

4 
1.212 
0.994 
0.497 
1.109 
0.980 
0.971 
0.101 

5 
0.954 
1.108 
0.827 
0.894 
1.043 
1.129 
0.162 

6 
0.863 
0.980 
1.696 
0.970 
1.000 
0.736 
0.284 

Total 
0.267 
0.196 
0.123 
0.127 
0.132 
0.156 
716,488 

B. 1976 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.834 
0.993 
0.477 
1.354 
0.988 
0.783 
0.102 

2 
1.012 
1.353 
0.562 
1.085 
1.048 
0.932 
0.189 

3 
0.649 
0.757 
1.212 
0.747 
1.040 
1.422 
0.166 

4 
1.363 
1.087 
0.667 
1.160 
0.952 
0.942 
0.107 

5 
0.943 
1.058 
0.942 
0.894 
0.976 
1.121 
0.163 

6 
0.785 
0.837 
1.536 
0.963 
0.980 
0.822 
0.273 

Total 
0.118 
0.215 
0.190 
0.129 
0.160 
0.188 
657,885 
Like geometric mean effect coding, the decomposition based on total sum reference coding gives more parameters than original data points. The constraints that define the relationships between parameters, and thus allow the redundant parameters to be derived, are as follows:
where m is the number of regions (Raymer, Bonaguidi and Valentini 2006).
For all of the total sum reference coding computations, see Table 3 in the Multiplicative components sheet of the accompanying workbook.
If the same decomposition scheme is applied to two sets of flow data from a given system of regions, all but the T parameter are scale free. This means that taking the ratios of two sets of components provides a simple method for examining stability in migration structure without confounding the effects of growth or decline in overall levels of migration (Rogers, Willekens, Little et al. 2002). In Table 4, ratios of the 1976 to 1973 components are displayed. Several depart substantially from 1 indicating the migration structure changed in the three years between 1973 and 1976. For example, the ratio of the components for OD_{11} is equal to 1.354, implying that migration within Category 1 was more attractive in 1976 than in 1973. In contrast, the ratio of the components for OD_{33} is equal to 0.816, suggesting migration within Category 3 was less attractive in 1976 than in 1973.
Table 4 Ratios of 1976 to 1973 multiplicative components
Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 
1 
1.354 
1.144 
0.957 
1.099 
1.121 
0.904 
0.522 
2 
1.169 
1.045 
1.075 
1.040 
0.923 
0.860 
1.252 
3 
0.839 
1.055 
0.816 
1.149 
1.062 
0.854 
1.562 
4 
1.125 
1.093 
1.342 
1.046 
0.972 
0.970 
1.058 
5 
0.988 
0.955 
1.139 
1.000 
0.936 
0.993 
1.003 
6 
0.909 
0.854 
0.906 
0.993 
0.980 
1.117 
0.961 
Total 
0.441 
1.096 
1.546 
1.015 
1.214 
1.210 
0.918 
Another form of the saturated loglinear model, which is an alternative to the multiplicative component model, is the linear additive model. Whether using the linear additive or the multiplicative form of the saturated loglinear model, the parameters represent the spatial structure of migration (Rogers, Willekens, Little et al. 2002) and each flow value can be fully reproduced by the parameters.
Because the multiplicative formation is formally equivalent to the gravity model (Willekens 1983), it is considered to be more appropriate than the linear additive model for representing spatial migration structures. On the other hand, the linear additive form is often found in statistics and when a standard statistical package (e.g., SPSS, Stata, R) is used to estimate a loglinear model, the parameters are always reported in the linear additive form. For that reason, the conventional calculations and interpretations of the parameters in the linear additive model are described in this subsection.
The additive formulation is a linear function of logarithms and it makes evident why the model came to be called the loglinear model (Knoke and Burke 1980). It is mathematically equivalent to the multiplicative component model and it results from taking logarithms of both sides of Equation 1 as follows:
which can be expressed more concisely as:
The λ values are simply the natural logarithms of the parameters appearing in Equation 1. The O, D, and OD superscripts are parameter descriptors (not exponents) and the subscripts i and j refer to the categories of the origin and destination variables, respectively.
Applying natural logarithmic transformations to the parameters in Table 2 and Table 3 would result in sets of corresponding linear additive parameters. However, just as there are at least two decompositions of the multiplicative component model, i.e., the geometric mean reference coding and the total sum effect coding, there are multiple strategies for arriving at sets of parameters that satisfy the linear additive model (Powers and Xie 2008), and the approaches taken by the standard statistical packages are not simply logarithmic transformations of the multiplicative components derived earlier.
Recall that a migration system with m regions has m×m linearly independent parameters. The multiplicative component models described above give an interpretable value for 1+m+m+(m×m) parameters, though they are not linearly independent of each other. On the other hand, statistical routines in SPSS, Stata, and R calculate and report only linearly independent parameters, resulting in 1 value for
, m1 values for
, m1 values for
, and
(m1) ×(m1)
values for
The particular set of parameter values that is calculated and reported depends on the contrast coding scheme used by the software. Contrast coding blocks out one region by fixing all linear additive parameters for that region equal to 0. SPSS, for example, fixes the parameters for the last region, i.e., the region assigned the highest numeric value, m, in this case:
The parameters of the Netherlands data reported by SPSS are displayed in Table 5. The SPSS commands that generate these results for the 1973migration table, along with the SPSS output, are presented in Appendix 1 [36]. Table 5 with the Excel formulae for calculation of the parameters are available in the Contrast coding sheet of the accompanying workbook.
Table 5 Additive linear parameters using "last region" contrast coding
A. 1973 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
0.288 
0.284 
1.388 
0.076 
0.289 
0.000 
0.212 

2 
0.384 
0.109 
1.565 
0.315 
0.261 
0.000 
0.243 

3 
0.926 
1.128 
0.949 
1.216 
0.837 
0.000 
0.168 

4 
0.062 
0.262 
1.505 
0.143 
0.297 
0.000 
0.753 

5 
0.327 
0.304 
1.146 
0.509 
0.385 
0.000 
0.133 

6 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 

Total 
0.698 
0.518 
0.598 
0.071 
0.141 
0.000 
10.056 

B. 1976 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
0.897 
0.219 
1.122 
0.389 
0.057 
0.000 
1.033 

2 
0.129 
0.355 
1.132 
0.007 
0.059 
0.000 
0.240 

3 
0.738 
0.648 
0.785 
0.802 
0.488 
0.000 
0.049 

4 
0.416 
0.125 
0.971 
0.050 
0.165 
0.000 
0.798 

5 
0.126 
0.075 
0.799 
0.385 
0.314 
0.000 
0.208 

6 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 
0.000 

Total 
0.517 
0.151 
0.634 
0.222 
0.013 
0.000 
10.233 
Notice the parameters for the last region are equal to 0, and, therefore, make no contribution to Equation 2. Interpretation of the parameters in Table 5 is somewhat complicated since they are in logarithmic units. Conversion back to the multiplicative components by exponentiation gives yet another set of multiplicative components that satisfy Equation 1. These are presented in Table 6, and they are the multiplicative components associated with “last region” contrast coding. Generally, these are not used to describe the spatial structure of migration, but they are useful in describing migration systems because the interaction parameters, OD_{ij}, are equivalent to odds ratios.
Table 6 Multiplicative components using "last region" contrast coding
A. 1973 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.333 
0.753 
0.250 
1.079 
0.749 
1.000 
0.809 

2 
0.681 
0.897 
0.209 
0.730 
0.770 
1.000 
0.785 

3 
0.396 
0.324 
0.387 
0.296 
0.433 
1.000 
0.845 

4 
1.064 
0.769 
0.222 
0.867 
0.743 
1.000 
0.471 

5 
0.721 
0.738 
0.318 
0.601 
0.680 
1.000 
0.876 

6 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 

Total 
2.009 
1.679 
1.819 
1.073 
1.151 
1.000 
23,304 

B. 1976 Migration table 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
2.453 
1.245 
0.326 
1.475 
1.059 
1.000 
0.356 

2 
1.138 
1.426 
0.322 
0.993 
0.943 
1.000 
0.786 

3 
0.478 
0.523 
0.456 
0.448 
0.614 
1.000 
1.051 

4 
1.516 
1.133 
0.379 
1.051 
0.848 
1.000 
0.450 

5 
0.882 
0.928 
0.450 
0.681 
0.731 
1.000 
0.812 

6 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 
1.000 

Total 
0.596 
1.163 
1.885 
0.801 
1.013 
1.000 
27,810 
For example, the overall parameter from the 1973migration data reported in Table 5, λ^{T}, gives the natural logarithm of the observed migrations for the reference region:
Another illustration from the 1973migration table in Table 5 shows how the origin main effects,
, are added to the overall parameter to reproduce the migrations from Category 1 to the reference destination, Category 6, reported in Table 1. For example:
Using the same approach, the logarithms of all the migration flows can be reproduced by applying Equation 1 with the appropriate parameters from Table 6, or the observed flows can be reproduced by applying Equation 2 using the parameters in Table 5.
The association parameters in the linear form,
, are logged odds ratios (LORs), which are the logarithm of the ratio of two odds: 1) the odds of migration to region j rather than the reference region, conditional on originating in region i; and 2) the odds of migration to region j rather than the reference region, conditional on originating in the reference region. For example, from the 1973 submatrix in Table 5,
= 1.565, which is calculated as:
In words, the parameter is described as the logged ratio of the odds of migration to Category 3, rather than to Category 6, between a migrant originating in Category 2 and one originating in Category 6.
Odds ratios measure the relative likelihood of one outcome to another, and because they are more standard than LOR, it may be easier to exponentiate the LORs and interpret the association parameters, presented in Table 6, as odds ratios. For example, the model parameter OD_{23}, for the 1973 data, is calculated as:
In words, the odds that a migrant from Category 2 will choose Category 3 over Category 6 is approximately 1/5^{th }the odds that a migrant from Category 6 will choose Category 3 over Category 6. Oddsratios are always positive and always depend on the choice of reference category. An odds ratio equal to 1 means a null relationship, i.e., statistical independence. Values higher than 1 mean a positive association and values less than 1 indicate a negative association.
Stata and R use a different contrast coding scheme to SPSS. Both of these statistical packages use the “first region” contrast coding as opposed to the “last region” contrast coding used by SPSS. In these two programs, the parameters for the first region, i.e., the region assigned the lowest numeric value, are fixed to be equal to 0, i.e.,
The Stata and R commands for generating the linear additive parameters, as well as the corresponding output, for the 1973 migration data can be downloaded from Appendix 1 [36].
All forms of the saturated model and all statistical methods for estimating the interaction parameters are in agreement and provide substantively similar results. The formulae for the calculations of the parameters are available in the Linear Additive Parameters sheet of the accompanying workbook. Furthermore, tests that each linear additive interaction parameter is equal to 0 are done automatically by SPSS and Stata. These results are available from Appendix 1 [36] and they show that each nonredundant interaction parameter is statistically significant. See Agresti and Finlay (2009) and Powers and Xie (2008)) for descriptions of the standard errors of the estimates.
All the models presented to this point have been saturated, and, therefore, perfectly represent the observed flows. Generally, the substantively interesting parameters are the interaction parameters because they indicate associations between pairs of regions. The independence model, however, hypothesizes that the interaction parameters are uninteresting and unnecessary because all multiplicative interaction parameters, OD_{ij}, are equal to 1, or, equivalently, all linear additive interaction parameters,
, are equal to 0. The independence model implies that the interaction terms should fall out of the model, reducing it to the most parsimonious form of a twovariable model, i.e.
,or, equivalently,
Visual inspection of the interaction parameters in the saturated loglinear model is one strategy for investigating the independence hypothesis. Another method is to calculate row or column conditional distributions. If the conditional distributions within rows (origins) are identical, there is independence between origins and destinations. In addition, since independence is a symmetric property, if the conditional distributions within rows (origins) are identical, the distributions within columns (destinations) also will be identical (Agresti and Finlay 2009; Powers and Xie 2008). In the Independence sheet of the accompanying workbook, the percentages of the Netherlands migrations within columns (destinations) are calculated. The column percentages are quite varied, suggesting, like the interaction parameters, that statistical independence is unfounded in this example.
The independence hypothesis implies that each particular interregional flow can be determined by the sizes of the marginal flows. Let N_{ij} be the expected flow between regions i and j if the independence hypothesis is true. N_{ij} is then equal to the total number of flows in the migration system, n_{++}, multiplied by the proportion of the all migrants leaving from region i, n_{i+}/n_{++}, times the proportion of all migrants moving to region j, n_{+j}/n_{++}, i.e., N_{ij} = n_{++}(n_{i+}/n_{++})(n_{+j}/n_{++}). If independence can be assumed, a good estimate of an interregional flow is N_{ij}, and the problem of estimating interregional migration flows is truly simplified.
The differences between the observed flows, n_{ij}, and the expected flows, N_{ij}, form the basis of the goodnessoffit evaluation and the Pearson ChiSquared Statistic, denoted Χ^{2}, which is widely used to summarize these discrepancies. It is calculated as:
where the summation is taken over all internal cells in the migration matrix. When there is perfect agreement between the observed and the expected flows, over all cells, the Χ^{2} equals 0 indicating the independence model fits the data perfectly. Larger differences between n_{ij} and N_{ij} produce larger Χ^{2 }values and increasingly stronger evidence that the independence model is inadequate. In general, smaller values indicate a good fit and larger values a poor fit.
If the independence hypothesis is true, the Χ^{2} statistic is governed by the Χ^{2 }probability distribution with (m1)×(m1) degrees of freedom. This distribution provides the basis for testing the significance of the Χ^{2 }statistic (Agresti 2007; Agresti and Finlay 2009). If the Χ^{2 }statistic falls in the rightsided extremes of its distribution, it signifies a low probability, e.g., p<0.05, that the independence hypothesis is true, and the model is rejected. The Χ^{2} values associated with independence model applied to the Netherlands data in Table 1 are calculated and reported in the Independence sheet of the accompanying workbook. See Appendix 2 [36] for the SPSS, Stata and R commands for testing the independence model with the 1973 example data.
The Χ^{2} value associated with the 1973 example data is 47,623, and the degrees of freedom (df) are 25. The associated pvalue is less than 0.000, and the hypothesis of independence is rejected. (However, see the comments below about the limitations of this test when the sample size is large.) This is not surprising given the three multiplicative decompositions of the Netherlands data, presented in Table 2, Table 3 and Table 6. The evidence consistently shows strong associations between regions and many of the multiplicative association parameters are not close to 1. Furthermore, the standard errors reported in Appendix 1 [36] by SPSS and Stata indicate the linear additive interaction parameters are significantly different from 0.
One alternative to the Χ^{2} statistic is called either the likelihood ratio statistic, the deviance, or the G^{2} statistic. All are different names for the same test statistic, and which name is used is determined by the preferences of authors of text books and software packages. For simplicity, G^{2 }will be adopted here. It is similar to the Χ^{2} in that values close to 0 indicate a wellfitting model and large values indicate a poor fit. If the hypothesized independence model holds, the G^{2} statistic also has a Χ^{2} distribution.
The G^{2} statistic has general utility that goes well beyond the independence model in loglinear analysis. It is widely used for comparing a simpler model to a more complex model. The G^{2} statistic is derived from the ratio of two likelihoods: 1) the likelihood that the constrained model (here the model of independence) fits the data; and 2) the likelihood that the unconstrained model (here the saturated model) fits the data. If the ratio is close to 1, the simpler, constrained, and more parsimonious model is preferred because it represents the data as well as the more complex model does.
The ratio of the two likelihoods does not have a Χ^{2} distribution. However, when the ratio is transformed into natural logarithm units and multiplied by 2, it becomes G^{2}, which is a Χ^{2} distributed variable with (m1)×(m1)degrees of freedom. If L_{c} is the likelihood associated with the constrained (i.e., independence) model, and L_{u} is the likelihood under the unconstrained (i.e., saturated) model, then G^{2} is calculated as:
Because the saturated model fits the data perfectly, i.e., L_{u} = 1, G^{2} = –2ln L_{c}. The values, based on the example and the statistical software, are reported in Appendix 2. The value is reported to be 46,477.63 and it is called “Deviance” by SPSS and Stata. It is rounded and reported to be equal to 46,480 by R, where it is called “Residual Deviance.” With 25 degrees of freedom the probability that the independence model holds is effectively 0.
The Χ^{2} and the G^{2} statistics are asymptotically equivalent (Powers and Xie 2008) and they form the bases of the Pearson Chisquare and the likelihood ratio tests, respectively. As with all inferential tests, effective use requires attention to underlying assumptions as well as limitations. Both tests rely on the assumption that each interregional flow count in the migration table follows an independent Poisson distribution (Powers and Xie 2008) and both tests have the important limitations that are related to sample size. The Χ^{2} statistic is inflated by large samples. Therefore, the Pearson Chisquare test is not appropriate to when the sample size is large. The G^{2} statistic and the likelihood ratio test is preferred in this situation (Powers and Xie 2008). The Pearson Chisquare test is preferred when the expected frequencies average between 1 and 10, but neither statistic works well if most of the expected frequencies are less than 5 (Agresti and Finlay 2009; Powers and Xie 2008).
Criticism has been made of the G^{2} statistic as well when samples are large (Raftery 1986, 1995) and there is growing consensus that information measures should be considered along with traditional significance tests in assessing model fit. The Bayesian Information Criterion (BIC) is closely related to G^{2}, and it is calculated by Stata as: BIC = G^{2}–df ln(mxm), and by SPSS as:
where p is the number of parameters estimated in the independence model, i.e., 2m1. A low value suggests choosing the independence model over the saturated model (Powers and Xie 2008).
Akaike’s Information Criterion (AIC) is another alternative that takes on smaller values for better fitting models, since it judges how close the fitted values are to the expected values (Agresti 2007). In SPSS and R, it is calculated as:
where p is the number of parameters estimated in the independence model, i.e., 2m1. In Stata, it is calculated as:
As shown in Appendix 2 [36], SPSS and Stata report the BIC and AIC, and R reports only the rounded AIC. As previously stated, there are differences in the formulae used. The BIC reported by SPSS equals 46,934.237, and the BIC reported by Stata equals 46,388.04. R reports only the AIC, which is equal to 46,920, the rounded value reported by SPSS, 46,916.818. Stata’s AIC value is substantially smaller and is equal to 1,303.245. All reported BIC and AIC values are large and add to the growing evidence that discredits the independence model for this example.
The independence model rarely provides an adequate fit to migration data. This is due, in part, to the overwhelming tendency to continue to reside in the same region. The quasiindependence model allows these “immobility” effects (Powers and Xie 2008) to be removed from the model, and this often results in improved predictions of interregional migration flows. The quasiindependence model has been applied effectively to migration data obtained from national censuses (Agresti 1990; Rogers, Little and Raymer 2010; Rogers, Willekens, Little et al. 2002), where persons who reported living in the same region at the time of the census as at the beginning of the reference period are represented in the diagonal elements of a migration table.
To illustrate, United States nativeborn migration data between 1985 and 1990 are reported in Panel A of Table 7. Clearly, the flows reported in the four diagonal elements of the interior submatrix are substantially larger than the offdiagonal elements, indicating that the propensity to maintain residence in the same region is much more typical than migration between regions.
The clustering along the diagonal cells contributes significantly to the poor fit of the independence model, and the dominating influence of the persons remaining in the region of origin have caused researchers to favour omitting them from the model. If migrants are defined as people changing their region of residence, this type of flow matrix is sometimes called a “migrants only” matrix. It is particularly useful for studying migration structure since it eliminates people who made no move or moved within the same region. Panel B of Table 7 displays the flow table with the diagonal elements set to 0, and the marginal totals adjusted accordingly.
Table 7 United States nativeborn migration flows, 19851990
A. Full migration table 

Destination 

Origin 
Northeast 
Midwest 
South 
West 
Total 

Northeast 
40,262,319 
336,091 
1,645,843 
479,819 
42,724,072 

Midwest 
351,029 
50,677,007 
1,692,687 
958,696 
53,679,419 

South 
778,868 
1,197,134 
69,563,871 
1,150,649 
72,690,522 

West 
348,892 
668,979 
1,082,104 
37,872,893 
39,972,868 

Total 
41,741,108 
52,879,211 
73,984,505 
40,462,057 
209,066,881 

B. Migrantsonly table 

Destination 

Origin 
Northeast 
Midwest 
South 
West 
Total 

Northeast 
0 
336,091 
1,645,843 
479,819 
2,461,753 

Midwest 
351,029 
0 
1,692,687 
958,696 
3,002,412 

South 
778,868 
1,197,134 
0 
1,150,649 
3,126,651 

West 
348,892 
668,979 
1,082,104 
0 
2,099,975 

Total 
1,478,789 
2,202,204 
4,420,634 
2,589,164 
10,690,791 
The multiplicative components, using total sum reference coding, for the full migration table and the migrantonly table are reported in Table 8. The magnitude of the multiplicative component model parameters for the full data certainly departs from what is expected under the hypothesis of independence. They are substantially above 1.0 on the diagonal and the offdiagonal components are far below 1.0. In comparison, the multiplicative components for the migrantsonly table are constrained to be equal to 0 in order to reproduce the structural zeros on the diagonal, and, as a result, the offdiagonal components are closer to 1.0
Table 8 Multiplicative components* of United States nativeborn migration flows, 19851990
A. Full migration table 

Destination 

Origin 
Northeast 
Midwest 
South 
West 
Total 

Northeast 
4.720 
0.031 
0.109 
0.058 
0.204 

Midwest 
0.033 
3.733 
0.089 
0.092 
0.257 

South 
0.054 
0.065 
2.704 
0.082 
0.348 

West 
0.044 
0.066 
0.076 
4.896 
0.191 

Total 
0.200 
0.253 
0.354 
0.194 
209,066,881 

B. Migrantsonly table 

Destination 

Origin 
Northeast 
Midwest 
South 
West 
Total 

Northeast 
0.000 
0.663 
1.617 
0.805 
0.230 

Midwest 
0.845 
0.000 
1.363 
1.318 
0.281 

South 
1.801 
1.859 
0.000 
1.520 
0.292 

West 
1.201 
1.547 
1.246 
0.000 
0.196 

Total 
0.138 
0.206 
0.413 
0.242 
10,690,791 

*Total sum reference coding 
The quasiindependence model requires that only migrations between different regions satisfy the independence assumption. This is estimated in two different but equivalent ways. The first method takes the full migration table data as in Panel A of Table 8, and fixes the weights on the interactive effects, OD_{ij} , to be zero when the regions of origin and destination are the same, i.e., i=j, insuring that n_{ij}=0. These are called structural zeros. When the origin and destination regions are different, i.e.,
, the interaction effects are fixed at 1.0, which is the familiar independence model and gives the predicted offdiagonal flows under the quasiindependence hypothesis. Implementation of this method in SPSS, Stata and R is illustrated in Appendix 3 [36] (available on the Tools for Demographic Estimation website).
The second method does not use the full migration data, but uses the migrantsonly data as in Panel B of Table 7. It is best presented with the additive form:
, where I is an indicator variable taking on values of 1 for the diagonal flows, i.e., when i=j, and values of 0 for the offdiagonal flows, i.e., when
(Agresti 2002). Therefore, an extra parameter,
, is necessary to estimate each diagonal flow, and for the other interregional flows the
term falls out and the quasiindependence model reduces to the independence model. Consequently, just like the independence model, the offdiagonal interaction terms are constrained to be equal to 0 in the additive form of the model (and equal to 1 in the multiplicative form). Application of this method in Stata is illustrated in Ap [36]pendix 3 [36].
In the first method, the quasiindependence model fixes m parameters, OD_{ii} , for i = 1 to m, to be equal to 0. In the second method, m additional parameters,
, are estimated, and when exponentiated will be very close to 0. Using either method, the quasiindependence model has m more parameters than the full independence model and the degrees of freedom are reduced by m.
Appendix 3 [36] (available on the Tools for Demographic Estimation website) illustrates how the quasiindependence model is estimated with statistical software packages SPSS, Stata and R, using the United States nativeborn migration flow data, 19851990. When the independence model is estimated with the full data, as expected, all goodnessoffit indicators are extremely large: Χ^{2} =544,479,395 (df= 9); G^{2} = 461,411,576 (df= 9); Stata values for BIC and AIC are 461,000,000 and 28,800,000, respectively. When the quasiindependence model is estimated, all values were reduced substantially: Χ^{2} =327,233 (df=5); G^{2} =330,220(df=5); Stata values for BIC and AIC equal 330,207 and 27,535, respectively.
The inferential tests remain significant, and the quasiindependence model must be rejected as the true migration model. The independence and the quasiindependence models should not be compared, inferentially, with the likelihood ratio test because they are not nested models. However, the information measures may be compared directly. Both the BIC and AIC are reduced substantially, favouring the quasiindependence model over the independence model.
In addition, the predicted flows from the independence model can be contrasted with those from the quasiindependence model in Table 9. Visually comparing the predicted flows in Table 9 with the observed data in Table 7 reveals how much closer the quasiindependence model comes to representing the data. Two additional summary statistics are reported: R^{2} and Mean Absolute Percent Error (MAPE). A comparison of the R^{2} values shows the independence model explains 10% of the variation in the observed data and the quasiindependence model explains 95%. Furthermore, the average percent error for the quasiindependence model (MAPE=28) is dramatically reduced in comparison to the independence model (MAPE=2,492).
Since the fit of the quasiindependence model is not close enough to the observed data, it must be rejected as the “true” model. However, without observed migration data, the quasiindependence model may still offer a reasonable, but coarse, method for estimating interregional flows.
Table 9 Predicted United States nativeborn migration flows, 19851990, under independence and quasiindependence
A. Independence 

Destination 

Origin 
1 
2 
3 
4 

1 
8,530,046 
10,806,184 
15,119,178 
8,268,664 

2 
10,717,328 
13,577,116 
18,996,052 
10,388,923 

3 
14,512,977 
18,385,588 
25,723,693 
14,068,264 

4 
7,980,756 
10,110,323 
14,145,583 
7,736,206 

R^{2}= 
0.104 
MAPE= 
2492.322 

B. Quasiindependence 

Destination 

Origin 
1 
2 
3 
4 

1 
0 
535,839 
1,349,561 
576,353 

2 
442,768 
0 
1,793,640 
766,005 

3 
720,681 
1,159,163 
0 
1,246,806 

4 
315,340 
507,201 
1,277,434 
0 

R^{2}= 
0.945 
MAPE= 
27.575 
The validity of the independence and quasiindependence models can be evaluated with the inferential test statistics that accompany the loglinear model output, and, even when the models are not supported with significance tests, these models may be applied, in some contexts, to produce meaningful estimates of migration flows. The method of offsets assumes the auxiliary data have an implied structure of interregional associations that resembles the unknown migration structure. The method of offsets borrows the structure of the auxiliary data to derive the estimates of the missing migration flow data.
In past research, the auxiliary information, typically, has been a table of migration flows from another period in history (Rogers, Little and Raymer 2010; Rogers, Willekens, Little et al. 2002; Rogers, Willekens and Raymer 2003; Willekens 1983), but it could be from another age (Raymer and Rogers 2007), another sex or race group. It could be from another data source all together such as tax return data or motor vehicle registration data.
Given the auxiliary flow data,
, the loglinearwithoffsets model is specified as:
This model will estimate flows,
, that have a migration structure that comes as close as possible to that of the auxiliary flow data, and, at the same time, the estimated flows are adjusted to sum to the marginal totals prespecified by the researcher. In this way, the method of offsets is similar to the independence and quasiindependence models in that it provides an expected distribution of the flows such that the marginal row and column totals are equal to the a priori estimates.
To illustrate the workings of the method of offsets, consider the Netherlands 1976 migration flow matrix in Table 1. Suppose we wish to keep the numerical values of the row and column marginal totals, but, at the same time, wish to replace the migration interaction effects observed during that year by those observed during 1973, using the method of offsets. What would be the corresponding set of loglinear parameters? Table 10 sets out the predicted flow matrix obtained by the method of offsets in Panel A, and Panel B presents the associated multiplicative components derived using the total sum reference coding. Note that the T, O_{i} and D_{j} values of the predicted matrix, i.e., Panel B of Table 10, are identical to those reported for the observed 1976 flow matrix in Panel B of Table 3. However, the other terms (i.e., the interaction effects, OD_{ij}) reflect the influence of the migration structure of the observed 1973 data, Panel A of Table 3, as well as the row and column totals taken from the 1976 data. Therefore, the method of offsets applies the structure of the auxiliary data, the 1973 data in this case, to the interior flows, and at the same time, preserves the total number of flows observed in the 1976 data.
Table 10 Interregional migration flows in the Netherlands (1976), predicted with the method of offsets from the marginal totals (1976) and the migration flow table (1973)

PANEL A: Predicted using method of offsets 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
12,344 
13,769 
6,890 
12,199 
10,361 
11,518 
67,081 

2 
13,329 
34,695 
12,195 
17,445 
22,522 
24,353 
124,539 

3 
9,728 
15,711 
28,330 
8,883 
15,881 
30,553 
109,087 

4 
11,281 
16,107 
7,011 
11,216 
11,764 
13,187 
70,566 

5 
12,609 
25,486 
16,570 
12,828 
17,770 
21,760 
107,023 

6 
18,116 
35,786 
53,984 
22,110 
27,058 
22,535 
179,589 

Total 
77,408 
141,553 
124,980 
84,682 
105,356 
123,906 
657,885 

R^{2}= 
0.966 
MAPE= 
8.364 

Panel B. Multiplicative components using total sum reference coding 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
1.564 
0.954 
0.541 
1.413 
0.964 
0.912 
0.102 

2 
0.910 
1.295 
0.515 
1.088 
1.129 
1.038 
0.189 

3 
0.758 
0.669 
1.367 
0.633 
0.909 
1.487 
0.166 

4 
1.359 
1.061 
0.523 
1.235 
1.041 
0.992 
0.107 

5 
1.001 
1.107 
0.815 
0.931 
1.037 
1.080 
0.163 

6 
0.857 
0.926 
1.582 
0.956 
0.941 
0.666 
0.273 

Total 
0.118 
0.215 
0.190 
0.129 
0.160 
0.188 
657,885 
The predicted results in Panel A of Table 10 were taken from the output of the SPSS, Stata, and R commands for implementing the method of offsets found in Appendix 4 [36]. See the Method of offsets sheet in the accompanying Excel spreadsheet for other calculations.
Since the flows were observed directly in 1976, there are several ways to evaluate the suitability of the method of offsets for predicting the data. One simple method is to inspect visually the ratios of the association multiplicative components, as demonstrated in Table 4. Another method is to use the inferential tests and information measures reported by the loglinear procedures. These would be testing the hypothesis that the structure of the migration flows, i.e., the interaction parameters, did not change from 1973 to 1976. In the example reported in Table 10, the corresponding G^{2} statistic is equal to 5,914 (df=25), and the hypothesis that the auxiliary data represent the same migration process as the observed data must be rejected. The final method, of those suggested here, relies on the standard R^{2} and MAPE statistics to assess the fit between the observed and the predicted flows. These are reported in Panel A of Table 10 and are equal to 0.97 and 8.36, respectively. These statistics, as well as the ratios in Table 4, suggest this application of the method of offsets offers a set of estimates for the migration flows in 1976 that may be quite satisfactory.
The importance placed on the goodnessoffit statistics depends on the quality of the observed flows used as inputs to the method of offsets. If the method is to be useful in a practical situation, it must be applicable when the interregional flows are not directly observed. In the absence of flow data, the method would still require preestimates of the marginal totals. Furthermore, if the method is implemented as illustrated in Appendix 4 (available on the Tools for Demographic Estimation website), initial estimates of the interregional flows are required. Therefore, the preestimates of the row and column totals would need to be distributed across the internal cells of the flow matrix so they add up to the respective marginal totals. Table 11, Panel A, presents a typical scenario, albeit continuing to use the marginal totals from the Netherlands 1976 data, which were observed. A simple solution is to distribute the flows according to the independence model, i.e., , which results in the initial estimates of the flows displayed in Panel B of Table 11.
As long as the initial interregional flows add up to the marginal totals, the predicted flows are not affected by the method used to distribute the flows within the cells. This is true because the flows will be predicted, ultimately, from the auxiliary data through the method of offsets, using the iterative proportional fitting algorithm (Agresti 1990; Deming and Stephan 1940). In other words, the initial estimates of the 1976 Netherland flows, used as input to the offsets loglinear model, could be the internal cells of Table 1, Panel B, or those in Table 11, Panel B. Either set of initial estimates would yield the predicted flows that are reported in Table 10, Panel A.
On the other hand, it is important to note that the associated inferential test statistics and the information measures that accompany the method of offsets must be interpreted with respect to the initial flow estimates. For example, if the initial flows were taken from Panel B of Table 11, the associated X^{2} and G^{2} test statistics would be testing the hypothesis that the predicted data are distributed in a manner that is consistent with the independence model.
Table 11 The inputs to the method of offsets in the absence of observed flows
Panel A. Preestimation marginal totals from the Netherlands, 1976 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 






67,081 

2 






124,539 

3 






109,087 

4 






70,566 

5 






107,023 

6 






179,589 

Total 
77,408 
141,553 
124,980 
84,682 
105,356 
123,906 
657,885 

Panel B. Independence model distribution scheme for initial flow estimates 

Destination 

Origin 
1 
2 
3 
4 
5 
6 
Total 

1 
7,893 
14,433 
12,744 
8,635 
10,743 
12,634 
67,081 

2 
14,654 
26,796 
23,659 
16,030 
19,944 
23,456 
124,539 

3 
12,835 
23,472 
20,724 
14,042 
17,470 
20,545 
109,087 

4 
8,303 
15,183 
13,406 
9,083 
11,301 
13,290 
70,566 

5 
12,593 
23,027 
20,331 
13,776 
17,139 
20,157 
107,023 

6 
21,131 
38,641 
34,117 
23,116 
28,760 
33,824 
179,589 

Total 
77,408 
141,553 
124,980 
84,682 
105,356 
123,906 
657,885 
It is a simple matter to modify the method of offsets to apply it to the problem of predicting a table of “migrants only.” The SPSS, Stata and R commands require minor modifications that are specified in comments in Appendix 4 (available on the Tools for Demographic Estimation website). A worked example is included in the Method of offsets, migrants only sheet of the accompanying workbook. It uses the observed U.S. flows, 19851990, to retrospectively estimate the 197580 migrant flows reported by Rogers, Willekens, Little et al. (2002).
Agresti A. 1990. Categorical Data Analysis. New York: Wiley.
Agresti A. 2002. Categorical Data Analysis. New York: WileyInterscience.
Agresti A. 2007. An Introduction to Categorical Data Analysis. Hoboken, NJ: WileyInterscience.
Agresti A and B Finlay. 2009. Statistical Methods for the Social Sciences. Upper Saddle River, NJ: Pearson Prentice Hall.
Alonso W. 1986. Systemic and loglinear models: From here to there, then to now, and this to that. Discussion paper 8610. Cambridge, MA: Harvard University, Center for Population Studies.
Birch MW. 1963. "Maximum likelihood in threeway contingency tables", Journal of the Royal Statistical Society Series BStatistical Methodology 25(1):220233.
Deming WE and FF Stephan. 1940. "On a least squares adjustment of a sampled frequency table when the expected marginal totals are known", Annals of Mathematical Statistics 11(4):427444. doi: http://dx.doi.org/10.1214/aoms/1177731829 [37]
Knoke D and PJ Burke. 1980. Loglinear Models. Beverly Hills, CA: Sage Publications.
Mueser P. 1989. "The spatial structure of migration: An analysis of flows between states in the USA over three decades", Regional Studies 23(3):185200. doi: http://dx.doi.org/10.1080/00343408912331345412 [38]
Nair PS. 1985. "Estimation of periodspecific gross migration flows from limited data: Biproportional adjustment approach", Demography 22(1):133142. doi: http://dx.doi.org/10.2307/2060992 [39]
Powers DA and Y Xie. 2008. Statistical Methods for Categorical Data Analysis. Bingley, UK: Emerald.
Raftery AE. 1986. "Choosing models for crossclassifications", American Sociological Review 51(1):145146. doi: http://dx.doi.org/10.2307/2095483 [40]
Raftery AE. 1995. "Bayesian model selection in social research", Sociological Methodology 25(1):111163. doi: http://dx.doi.org/10.2307/271063 [41]
Raymer J. 2007. "The estimation of international migration flows: A general technique focused on the origindestination association structure", Environment and Planning A 39(4):985995. doi: http://dx.doi.org/10.1068/a38264 [42]
Raymer J, A Bonaguidi and A Valentini. 2006. "Describing and projecting the age and spatial structures of interregional migration in Italy", Population, Space and Place 12(5):371388. doi: http://dx.doi.org/10.1002/psp.414 [43]
Raymer J and A Rogers. 2007. "Using age and spacial flow structures in the indirect estimation of migration streams", Demography 44(2):199–223. doi: http://dx.doi.org/10.1353/dem.2007.0016 [44]
Rees P and FJ Willekens. 1986. "Data and accounts," in Rogers, A and FJ Willekens (eds). Migration and Settlement: A Multiregional Comparative Study. Dordrecht: D. Reidel, pp. 1958.
Rogers A, JS Little and J Raymer. 2010. The Indirect Estimation of Migration: Methods for Dealing with Irregular, Inadequate, and Missing Data. Dordrecht: Springer.
Rogers A, F Willekens, JS Little and J Raymer. 2002. "Describing migration spatial stucture", Papers in Regional Science 81(1):2948.
Rogers A, FJ Willekens and J Raymer. 2003. "Imposing age and spatial structures on inadequate migrationflow datasets", The Professional Geographer 55(1):5669.
Snickars F and JW Weibull. 1977. "A minimum information principle: Theory and practice", Regional Science and Urban Economics 7(12):137168. doi: http://dx.doi.org/10.1016/01660462(77)900217 [45]
Willekens F. 1983. "Loglinear modeling of spatial interaction", Papers of the Regional Science Association 52:187205. doi: http://dx.doi.org/10.1007/BF01944102 [46]
Links:
[1] http://dx.doi.org/10.2307/2546515
[2] http://dx.doi.org/10.1590/S010230982010000100002%20
[3] http://dx.doi.org/10.1111/j.17284457.2005.00050.x
[4] http://dx.doi.org/10.1002/psp.440%20
[5] http://dx.doi.org/10.1353/dem.2007.0016%20
[6] http://dx.doi.org/10.1068/a120489
[7] http://dx.doi.org/10.1080/01621459.1986.10478237
[8] http://www.un.org/esa/population/techcoop/DemEst/manual4/manual4.html
[9] http://www.un.org/esa/population/techcoop/IntMig/manual6/manual6.html
[10] http://www.un.org/esa/population/publications/migration/WorldMigrationReport2009.pdf
[11] http://unstats.un.org/unsd/publication/SeriesM/Seriesm_67rev2e.pdf
[12] http://dx.doi.org/10.1080/08898489909525459%20
[13] http://dx.doi.org/10.2307/2546519%20
[14] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_EstMig_01.png
[15] http://webarchive.iiasa.ac.at/Admin/PUB/Documents/RR81006.pdf
[16] http://webarchive.iiasa.ac.at/Admin/PUB/Documents/RR81030.pdf
[17] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_01a.png
[18] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_RogersCastro.png
[19] http://www.xlxtrfun.com/XlXtrFun/XlXtrFun.htm
[20] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_03.png
[21] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_04.png
[22] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_05.png
[23] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_06.png
[24] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/imagecache/wysiwyg_imageupload_lightbox_preset/wysiwyg_imageupload/3/MI_MEMM_07.png
[25] http://www.datamaster2003.com/index.html
[26] http://www.rproject.org/
[27] http://dx.doi.org/10.1068/a140889%20
[28] http://dx.doi.org/10.1068/a190521%20
[29] http://dx.doi.org/10.2307/2060581%20
[30] http://www.mendeley.com/research/rlanguageenvironmentstatisticalcomputing13/
[31] http://dx.doi.org/10.1068/a090247%20
[32] http://dx.doi.org/10.1080/08898480590902145%20
[33] http://dx.doi.org/10.1080/08898489409525372%20
[34] http://dx.doi.org/10.1080/08898489909525457%20
[35] http://dx.doi.org/10.1177/0164027587094002
[36] http://demographicestimation.iussp.org/sites/demographicestimation.iussp.org/files/MI_LLM_appendices.pdf
[37] http://dx.doi.org/10.1214/aoms/1177731829%20
[38] http://dx.doi.org/10.1080/00343408912331345412
[39] http://dx.doi.org/10.2307/2060992
[40] http://dx.doi.org/10.2307/2095483%20
[41] http://dx.doi.org/10.2307/271063%20
[42] http://dx.doi.org/10.1068/a38264%20
[43] http://dx.doi.org/10.1002/psp.414%20
[44] http://dx.doi.org/10.1353/dem.2007.0016
[45] http://dx.doi.org/10.1016/01660462%2877%29900217
[46] http://dx.doi.org/10.1007/BF01944102