Title: | Biodemography Functions |
---|---|
Description: | The Biodem package provides a number of functions for Biodemographic analysis. |
Authors: | Alessio Boattini and Federico C. F. Calboli; Vincente Canto Cassola together with Martin Maechler authored the function mtx.exp. |
Maintainer: | Federico Calboli <[email protected]> |
License: | GPL-2 |
Version: | 0.5 |
Built: | 2024-11-01 11:18:57 UTC |
Source: | https://github.com/fedster/biodem |
Calculates the column stochastic matrix starting from the raw migration matrix x
. For each column, it divides each term by the column sum. Then it returns the thus "normalized by column" matrix, ready to be used in the Malecot migration model.
col.sto(x)
col.sto(x)
x |
the raw data migration matrix |
The Malecot model uses a transformation of the raw migration data; in the "Malecot" library the use of a column stochastic matrix follows Imaizumi 1970 and Swedlund 1984.
col.sto is used on a an object of class "matrix" and returns an object of class "matrix".
Federico C. F. Calboli [email protected]
Imaizumi, Y., N. E. Morton and D. E. Harris. 1970. Isolation by distance in artificial populations. Genetics 66: 569-582.
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
data(raw.mig) new.mig.mat<-col.sto(raw.mig) new.mig.mat
data(raw.mig) new.mig.mat<-col.sto(raw.mig) new.mig.mat
Calculates the Fst from a conditional kinship matrix.
Fst(rval, N)
Fst(rval, N)
rval |
is a conditional kinship matrix, normally obtained by the functions 'R' and 'rel.cond' in the Biodem library. |
N |
the vector of effective populations size, nominally obtained by dividing the total population size by three. Starting form surname data, effective population size coincides with the number of marriages |
The use of the Fst function follows Harpending and Jenkins 1974, and Jorde 1982. It gives an estimate of Wright's Fst, which is a measure of between-subdivision genetic heterogeneity.
Fst returns one numeric value.
...
Federico C. F. Calboli [email protected]
Harpending, H. C. and T. jenkins. 1974. !Kung population structure. In: J. F. Crow and C. F. Denniston (eds.), Genetic distance, pp 137-161. Plenum Press, NY.
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
# Swedlund data again... data(P); data(S); data(N) # starting with how many cycles to equilibrium x<-mal.eq(S,P,N) # calculation of phi phi<-mal.phi(S,P,N,x) # calculation of the conditional kinship matrix cond<-mal.cond(phi,N) # finally! we get the Fst value fst<-Fst(cond,N) fst # starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix N <- colSums(tot) # effective population size fst<-Fst(kin.cond,N) fst
# Swedlund data again... data(P); data(S); data(N) # starting with how many cycles to equilibrium x<-mal.eq(S,P,N) # calculation of phi phi<-mal.phi(S,P,N,x) # calculation of the conditional kinship matrix cond<-mal.cond(phi,N) # finally! we get the Fst value fst<-Fst(cond,N) fst # starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix N <- colSums(tot) # effective population size fst<-Fst(kin.cond,N) fst
“hedrick”calculates the Hedrick standardized kinship coefficient starting from surname frequencies.
hedrick(x)
hedrick(x)
x |
is a surname frequency table where the N rows correspond to the surnames present in the whole population and the M columns are the subpopulations |
The use of “hedrick” could be problematic, because different people are likely to arrange isonymy data in different ways on their computers. We decided for a matrix format for the isonymy data; the function would originally accept data in a different format and then convert it internally, but this would be a problem for people with data arranged in a different format. In the end we decided to write a specific function, "sur.freq", to generate surname frequency tables directly from raw marriage data or marriage-like data (the most commonly used sources in bio-demographic studies). For other types of surname data, see the verbose explanation in the info for the dataset "surnames" so it would be clear for the user how “hedrick” works.
Returns a square symmetric standardized kinship matrix.
The Hedrick index was originally conceived as a measure of the probability of genotypic identity between (sub)populations and uses a standardization analogous to that employed when calculating a correlation coefficient. As a consequence, it is equal to 1 if measured on populations with identical surname distribution.
Federico C. F. Calboli and Alessio Boattini [email protected]
Hedrick, P. W. 1971. A new approach to measuring genetic similarity. Evolution 25: 276-280. Weiss, V. 1980. Inbreeding and genetic distance between hierarchically structured populations measured by surname frequencies. Mankind Quarterly 21: 135-149
sur.freq
to generate the input surname frequency table from marriage data, surnames
for an explanation on how to generate the correct input table from other surname sources, lasker
and uri
for other types of inter-population kinship matrices
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames hed.kin <- hedrick(tot) hed.kin # a standardized kinship matrix #starting from a generic surname frequency table data(surnames) surnames #a made-up dataset # you can see that the surnames are arranged as the _rows_ and # the populations are the _columns_ # the use of the function "hedrick" just turns this data into a kinship matrix hed.kin <- hedrick(surnames) hed.kin
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames hed.kin <- hedrick(tot) hed.kin # a standardized kinship matrix #starting from a generic surname frequency table data(surnames) surnames #a made-up dataset # you can see that the surnames are arranged as the _rows_ and # the populations are the _columns_ # the use of the function "hedrick" just turns this data into a kinship matrix hed.kin <- hedrick(surnames) hed.kin
“Lasker”calculates the lasker kinship coefficient starting from a surname frequency table.
lasker(x)
lasker(x)
x |
is a surname frequency table where the N rows correspond to the surnames present in the whole population and the M columns are the subpopulations |
The use of “lasker” could be problematic, because different people are likely to arrange isonymy data in different ways on their computers. We decided for a matrix format for the isonymy data; the function would originally accept data in a different format and then convert it internally, but this would be a problem for people with data arranged in a different format. In the end we decided to write a specific function, "sur.freq", to generate surname frequency tables directly from raw marriage data or marriage-like data (the most commonly used sources in bio-demographic studies). For other types of surname data, see the verbose explanation in the info for the dataset "surnames" so it would be clear for the user how “lasker” works.
Returns a square symmetric kinship matrix.
...
Federico C. F. Calboli and Alessio Boattini [email protected]
Lasker, G.W. 1977. A coefficient of relationship by isonymy: A method for estimating the genetic relationship between populations. Hum. Biol. 49:489-493.
sur.freq
to generate the input surname frequency table from marriage data, surnames
for an explanation on how to generate the correct input table from other surname sources, hedrick
and uri
for other types of inter-population kinship matrices
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames lask.kin <- lasker(tot) lask.kin # a kinship matrix #starting from a generic surname frequency table data(surnames) surnames #a made-up dataset # the surnames are arranged as the _rows_ and the populations are the _columns_ # the use of the function ``Lasker'' just turns this data into a kinship matrix lask.kin <- lasker(surnames) lask.kin
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames lask.kin <- lasker(tot) lask.kin # a kinship matrix #starting from a generic surname frequency table data(surnames) surnames #a made-up dataset # the surnames are arranged as the _rows_ and the populations are the _columns_ # the use of the function ``Lasker'' just turns this data into a kinship matrix lask.kin <- lasker(surnames) lask.kin
The function “mal.cond” calculates a R conditional kinship matrix starting from a kinship matrix obtained by the applicatio of the Malecot migration model on a colum stochastic migration matrix.
mal.cond(PHI, N)
mal.cond(PHI, N)
PHI |
|
N |
|
Much more useful than the Phi matrix, the conditional kinship R matrix is the basis for further analysis by means of Mantel tests, Procrustes rotations and cluster analysis.
Returns a square symmetrical matrix.
...
Federico C. F. Calboli [email protected]
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
mal.phi
for the calculation of "absolute" kinship values
# using Swedlund data again... data(S); data(P); data(N) x<-mal.eq(S,P,N) phi<-mal.phi(S,P,N,x) cond.mat<-mal.cond(phi,N) cond.mat
# using Swedlund data again... data(S); data(P); data(N) x<-mal.eq(S,P,N) phi<-mal.phi(S,P,N,x) cond.mat<-mal.cond(phi,N) cond.mat
Mal.eq calculates the Malecot model iteratively, stopping when one more cycle adds 0 to every value of the matrix obtained by the model. Once equilibrium is reached, Mal.eq returns the number of cycles ("generations") needed to reach it.
mal.eq(S, P, N)
mal.eq(S, P, N)
S |
is the Sistematic pressure matrix. |
P |
is the colum-stochastic migration matrix. |
N |
is the vector of effective population size. |
The use of mal.eq is necessary before the calculation of the Malecot model proper because the value returned by Mal.eq is one of the arguments of the Malecot model function Phi.
Returns one numeric value.
This function has been coerced to use "only" six significant digits. ...
Federico C. F. Calboli [email protected]
Imaizumi, Y., N. E. Morton and D. E. Harris. 1970. Isolation by distance in artificial populations. Genetics 66: 569-582.
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
Swedlund, A. C., L. B. Jorde and J. H. Mielke. 1984. Population structure in the Connecticut valley. I. Marital migration. American Journal of Physical Anthropology 65: 61-70
mal.phi
for the function using the output of 'mal.eq'
# the data is originally from a paper by Swedlund et al. 1984. data(S); data(P); data(N) mal.eq(S,P,N)
# the data is originally from a paper by Swedlund et al. 1984. data(S); data(P); data(N) mal.eq(S,P,N)
Calculates a kinship matrix using the Malecot Migration Model, in the form described by L. B. Jorde 1982.
mal.phi(S, P, N, n)
mal.phi(S, P, N, n)
S |
the sistematic pressure matrix, where the diagonal elements are 1-sk, with sk the sistematic pressure for the k-th population, and the non diagonal elements are 0 |
P |
the column stochastic migration matrix, possibly obtained using col.sto on the "raw" migration matrix |
N |
the vector of effective populations, where each element is the population size for all the n populations divided by 3 |
n |
the number of iterations needed to reach the equilibrium, calculated by the function Mal.eq |
The Malecot model is simply an iterative markow-chain-like process that gives rise to an asymptotic growth curve, so that an equilibrium is reached after a number of iterations.
Returns a square and symmetrical matrix.
...
Federico C. F. Calboli [email protected]
Imaizumi, Y., N. E. Morton and D. E. Harris. 1970. Isolation by distance in artificial populations. Genetics 66: 569-582.
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
mal.eq
for the function generating the number of cycles needed to reach the asymptotic value
# using Swedlund data again... data(S); data(P); data(N) x<-mal.eq(S,P,N) phi<-mal.phi(S,P,N,x) phi
# using Swedlund data again... data(S); data(P); data(N) x<-mal.eq(S,P,N) phi<-mal.phi(S,P,N,x) phi
Function “mar.iso” calculates Observed and Random Marital Isonymy starting from tables of observed couples of surnames frequencies in each (sub)population.
mar.iso(x)
mar.iso(x)
x |
is a table object containing N matrices, where N is the number of analysed (sub)populations. Each matrix is a square matrix whose dimensions are equal to the total number of different surnames observed in the analysed subpopulations. Rows correspond to male surnames entries and columns to female surnames entries |
Marital Isonymy coefficients are obtainable starting from marriage data or equivalent data. The tables of observed couples of surnames needed as argument in “mar.iso” are easily obtainable from raw data using the "sur.freq" function selecting the "marriage" option. Observed Isonymy (Pt) is the number of isonymic marriages (i. e. marriages in which both the mates have the same surname) on the total number of marriages. Random Isonymy (Pr) is the probability that two mates have randomly the same surname and is given by: Pr = sum (pi * qi), where pi is the frequence of the i-th surname among males and qi is the frequency of the i-th surname among females.
Returns a data frame reporting Observed Isonymy (Pt) and Random Isonymy (Pr) for each (sub)population (pop)
The Observed Isonymy coefficient (Pt) is a measure of within (sub)population kinship. The Random Isonymy coefficient (Pr) is an unbiased measure of the expected within (sub)population kinship value in case of random marriage unions. The output of the “mar.iso” function can be used as the argument for the "sur.inbr" function to calculate Inbreeding indexes. Pr values can also be substituted to the diagonal values of the kinship between populations matrix given by the function "uri" to obtain another unbiased random kinship matrix.
Federico C. F. Calboli and Alessio Boattini [email protected]
Crow, J. F., Mange, A. P. 1965. Measurement of inbreeding from the frequency of marriages between persons of the same surnames. Eugen. Q. 12:199-203. Crow, J. F. 1980. The estimation of inbreeding from isonymy. Hum. Biol. 52:1-14.
sur.freq
to calculate surnames frequencies tables from raw marriages data bases, sur.inbr
to calculate inbreeding coefficients starting from Pt and Pr, r.pairs
to calculate Repeated Pairs indexes, uri
to calculate a matrix of Unbiased Random Isonymy coefficients between (sub)populations
data(valley) valley #a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables of the observed pairs of surnames in each population iso <- mar.iso(mar) iso # a data frame containing Pt and Pr values for each (sub)population
data(valley) valley #a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables of the observed pairs of surnames in each population iso <- mar.iso(mar) iso # a data frame containing Pt and Pr values for each (sub)population
Calculates the n-th power of a matrix.
mtx.exp(X, n)
mtx.exp(X, n)
X |
a square matrix |
n |
the exponential value |
This function calculates (efficiently!) the n-th power of a matrix.
Takes a matrix and returns a matrix.
Original code by VCC "beautyfied" by MM
Vincente Canto Cassola and Martin Maechler
...
test<-matrix(c(1:16), 4,4) pow.test<-mtx.exp(test,10) pow.test
test<-matrix(c(1:16), 4,4) pow.test<-mtx.exp(test,10) pow.test
A vector giving the effective population size for n populations. The effective population size is calculated as the total population divided by three.
data(N)
data(N)
A 12 elements vector.
This data comes for Swedlund et al. 1984.
Swedlund, A. C., L. B. Jorde and J. H. Mielke. 1984. Population structure in the Connecticut valley. I. Marital migration. American Journal of Physical Anthropology 65: 61-70
data(N)
data(N)
A column stochastic migration matrix for 12 populations.
data(P)
data(P)
A 12 by 12 square matrix
This data comes for Swedlund et al. 1984.
Swedlund, A. C., L. B. Jorde and J. H. Mielke. 1984. Population structure in the Connecticut valley. I. Marital migration. American Journal of Physical Anthropology 65: 61-70
data(P)
data(P)
Function “r.pairs” calculates Observed and Random Repeated Pairs Coefficients starting from tables of observed couples of surnames frequencies in each (sub)population.
r.pairs(x)
r.pairs(x)
x |
is a table object containing N matrices, where N is the number of analysed (sub)populations. Each matrix is a square matrix whose dimensions are equal to the total number of different surnames observed in the analysed subpopulations. Rows correspond to male surnames entries and columns to female surnames entries. |
Repeated Pairs coefficients are obtainable starting from marriage data or equivalent data. The tables of observed couples of surnames needed as argument in “rep.pairs” are easily obtainable from raw data using the "sur.freq" function selecting the "marriage" option. Observed Repeated Pairs coefficient (RP) estimate the level of homozigosity in a (sub)population on the basis of repeated appearences of couples of identical surnames. Random Repeated Pairs coefficient (RPr) is the expected RP value in case of completely random marriage unions. Comparisons between RP and RPr are expressed with their percentage difference (perc.diff) given by (RP-RPr)/RPr.
Returns a data frame reporting Observed Repeated Pairs (RP), Random Repeated Pairs (RPr) and the Percentual difference between RP and RPr (perc. diff) for each (sub)population (pop).
RP and RPr are standardized indexes and their values vary between 0 and 1. RP, being calculated using the whole surname matrix, is considered a more reliable source of information on the level of homozigosity in a population than Isonymy data. An excess of RP on RPr, as calculated by their percentage difference, suggests the existence of a degree of subdvision internal to the analysed (sub)population.
Federico C. F. Calboli and Alessio Boattini [email protected]
Lasker G. W., Kaplan B. A. 1985. Surnames and genetic structure: repetition of the same pairs of names of married couples, a measure of subdivision of the population. Hum. Biol. 57:431-440. Chakraborty R. 1985. A note on the calculation of random RP and its sampling variance. Hum. Biol. 57:713-717. Chakraborty R. 1986. Erratum. Hum. Biol. 58:991.
sur.freq
to calculate surnames frequencies tables from raw marriages data bases, mar.iso
to calculate Observed and Random Isonymy coefficients starting from tables of couples of surnames frequencies, sur.inbr
to calculate Inbreeding indexes from Isonymy coefficients
data(valley) valley # a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables of the observed pairs of surnames in each population RP <- r.pairs(mar) RP # a data frame containing RP, RPr perc.diff values for each (sub)population
data(valley) valley # a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables of the observed pairs of surnames in each population RP <- r.pairs(mar) RP # a data frame containing RP, RPr perc.diff values for each (sub)population
Made up raw dataset, created as if count data for marital migration were put into matrix form.
data(raw.mig)
data(raw.mig)
A 4 by 4 square matrix.
Completely made up for pedagogical purposes.
data(raw.mig) col.sto(raw.mig)
data(raw.mig) col.sto(raw.mig)
"rel.cond" calculates a conditional kinship matrix starting from isonymy data.
rel.cond(x,R, method="A")
rel.cond(x,R, method="A")
x |
is a square Unbiased Random Isonymy matrix, possibly obtained using the "uri" function on the raw surname data |
R |
is an unbiased estimate of Regional Random Isonymy, calculated by the function "rri" |
method |
a character string specifying the method to be used in the calculation of the coefficients. The available options are "A" and "B". Both the methods give similar results. The "A" method is given as the default option |
The function implements Relethford's method to calculate kinship coefficients starting from surname data.
Returns a square symmetric conditional kinship matrix.
The term 'conditional kinship' refers to kinship relative to the contemporary region
Federico C. F. Calboli and Alessio Boattini [email protected]
Relethford, J. H. 1988. Estimation of kinship and genetic distance from surnames. Human Biology, 60(3): 475-492.
uri
to calculate Unbiased Random Isonymy starting from tables of surname frequencies, rri
to calculate an an unbiased estimate of Regional Random Isonymy, rel.phi
to calculate an 'a priori' kinship matrix from isonymy data
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset iso.matrix <- uri(surnames) iso.matrix # an unbiased random isonymy matrix reg <- rri(surnames) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset iso.matrix <- uri(surnames) iso.matrix # an unbiased random isonymy matrix reg <- rri(surnames) reg # a coefficient of unbiased Regional Random Isonymy kin.cond <- rel.cond(iso.matrix,reg) kin.cond # a conditional kinship matrix
"rel.phi" calculates an 'a priori' kinship matrix starting from isonymy data.
rel.phi(x,R, method="A")
rel.phi(x,R, method="A")
x |
is a square Unbiased Random Isonymy matrix, possibly obtained using the "uri" function on the raw surname data |
R |
is an unbiased estimate of Regional Random Isonymy, calculated by the function "rri" |
method |
a character string specifying the method to be used in the calculation of the coefficients. The available options are "A" and "B". Both the methods give similar results. The "A" method is given as the default option |
The function implements Relethford's method to calculate kinship coefficients starting from surname data.
Returns a square symmetric 'a priori' kinship matrix.
The term 'a priori kinship' refers to kinship relative to a founding population
Federico C. F. Calboli and Alessio Boattini [email protected]
Relethford, J. H. 1988. Estimation of kinship and genetic distance from surnames. Human Biology, 60(3): 475-492.
uri
to calculate Unbiased Random Isonymy starting from tables of surname frequencies, rri
to calculate an an unbiased estimate of Regional Random Isonymy, rel.cond
to calculate a conditional kinship matrix from isonymy data
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin <- rel.phi(iso.matrix,reg) kin # an 'a priori' kinship matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset iso.matrix <- uri(surnames) iso.matrix # an unbiased random isonymy matrix reg <- rri(surnames) reg # a coefficient of unbiased Regional Random Isonymy kin <- rel.phi(iso.matrix,reg) kin # an 'a priori' kinship matrix
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # an unbiased random isonymy matrix reg <- rri(tot) reg # a coefficient of unbiased Regional Random Isonymy kin <- rel.phi(iso.matrix,reg) kin # an 'a priori' kinship matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset iso.matrix <- uri(surnames) iso.matrix # an unbiased random isonymy matrix reg <- rri(surnames) reg # a coefficient of unbiased Regional Random Isonymy kin <- rel.phi(iso.matrix,reg) kin # an 'a priori' kinship matrix
"rri" calculates an unbiased estimate of Regional Random Isonymy starting from surname frequencies.
rri(x)
rri(x)
x |
is a surname frequency table where the N rows correspond to the surnames present in the whole population and the M columns are the (sub)populations |
The function implements Morton's isonymy method as outlined by Relethford. Unbiased estimate of Regional Random Isonymy refers to random isonymy of the contemporary region relative to the founding population. This value is an argument needed to calculate 'a priori' and conditional kinship matrices using the "rel.phi" and "rel.cond" functions.
Returns one numeric value.
The use of “rri” could be problematic, because different people are likely to arrange isonymy data in different ways on their computers. We decided for a matrix format for the isonymy data; the function would originally accept data in a different format and then convert it internally, but this would be a problem for people with data arranged in a different format. In the end we decided to write a specific function, "sur.freq", to generate surname frequency tables directly from raw marriage data or marriage-like data (the most commonly used sources in bio-demographic studies). For other types of surname data, see the verbose explanation in the info for the dataset "surnames" so it would be clear for the user how “rri” works.
Federico C. F. Calboli and Alessio Boattini [email protected]
Morton, N. E. 1973. Kinship bioassy. In: Genetic distance, J. F Crow and C Denniston (eds.). New York, Plenum Press, 97-104. Relethford, J. H. 1988. Estimation of kinship and genetic distance from surnames. Human Biology, 60(3): 475-492.
sur.freq
to generate the input surname frequency table from marriage data, surnames
for an explanation on how to generate the correct input table from other surname sources, uri
to calculate an Unbiased Random Isonymy matrix, rel.phi
to calculate an 'a priori' kinship matrix from isonymy data, rel.cond
to calculate a conditional kinship matrix from isonymy data
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames reg <- rri(tot) reg # an unbiased estimate or Regional Random Isonymy #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset # you can see that the surnames are arranged as the _rows_ # and the populations are the _columns_ reg <- rri(surnames) reg # an unbiased estimate or Regional Random Isonymy
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames reg <- rri(tot) reg # an unbiased estimate or Regional Random Isonymy #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset # you can see that the surnames are arranged as the _rows_ # and the populations are the _columns_ reg <- rri(surnames) reg # an unbiased estimate or Regional Random Isonymy
Systematic pressure matrix obtained by creating a square matix, where the non diagonal elements are all 0 and the diagonal elements are calculated as 1-Sk, where Sk is the systematic pressure for the k-th population.
data(S)
data(S)
A 12 by 12 square matrix.
This data comes for Swedlund et al. 1984.
Swedlund, A. C., L. B. Jorde and J. H. Mielke. 1984. Population structure in the Connecticut valley. I. Marital migration. American Journal of Physical Anthropology 65: 61-70
data(S)
data(S)
“sur.freq”calculates surnames frequency tables starting from raw marriage data or equivalent sources (i.e. birth registrations)
sur.freq(x,pop,mal.sur,fem.sur,freq.table="total")
sur.freq(x,pop,mal.sur,fem.sur,freq.table="total")
x |
is a data frame in which every row corresponds to a different marriage record. The data frame must contain: a a column reporting the population in which the marriage was recorded; b a column containing male surnames; c a column containing female surnames |
pop |
is the name of the column in the data frame that reports the population in which the marriage was recorded |
mal.sur |
is the name of the column in the data frame that contains male surnames |
fem.sur |
is the name of the column in the data frame that comtains female surnames |
freq.table |
character string specifying the type of surname frequency table to be calculated. The available options are: "males" (table calculated using only male surnames); "females" (table calculated using only female surnames); "total" (table calculated using all the surnames); "marriages" (tables calculated using observed pairs of surnames in each population). The default option is "total". |
“sur.freq” is specifically written to derive surname frequency tables from marriage data, or, more generally, data in which appear couples of related surnames, as birth records etc.
A single table of surname frequencies ("male", "female", "total" options) or tables of observed pairs of surnames frequencies for each population ("marriages" option)
Surname frequency tables produced with “sur.freq” are intended to be used as an argument for other functions to investigate the bio-demographic structure of populations. In particular, the "male", "female" and "total" options produce tables to be used in inter-population analyses (maesures of kinship/distance between populations, etc.); the "marriage" option produces tables to be used in intra-population analyses (inbreeding levels etc.). Tables of surname frequencies can also be obtained from simple lists of surnames (i.e. telephone directories, etc.) using the function “table”; for further explanations see the info for the "surnames" data set.
Federico C. F. Calboli and Alessio Boattini [email protected]
Lasker, G. W. 1985. Surnames and genetic structure. Cambridge University Press. Cambridge, England
mar.iso
for the calculation of Marital Isonymy coefficients from tables of observed pairs of surnames frequencies, r.pairs
fot the calculation of Repeated Pairs coefficients from tables of observed pairs of surnames frequencies, lasker
and hedrick
for the calculation of similarity indexes between populations from surnames frequency tables, surnames
for an explanation on how to generate a surname frequency table starting from non-marriage like data
data(valley) valley #a subset of a real marriage data base # you can see that marriages correspond to rows in the data frame. # Note that the data frame contains other columns tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames mal <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="males") mal # a frequency table calculated above the male surnames fem <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="females") fem # a frequency table calculated above the female surnames mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables for the observed pairs of surnames in each population
data(valley) valley #a subset of a real marriage data base # you can see that marriages correspond to rows in the data frame. # Note that the data frame contains other columns tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames mal <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="males") mal # a frequency table calculated above the male surnames fem <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="females") fem # a frequency table calculated above the female surnames mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables for the observed pairs of surnames in each population
Function “sur.inbr” calculates Total, Random and Non-random Inbreeding Coefficients starting from Observed and Random Isonymy indexes in each (sub)population.
sur.inbr(x,method="B")
sur.inbr(x,method="B")
x |
is a data frame composed by 3 columns which, in order, contain: a code (or a name) identifying the analysed (sub)populations; Observed Isonymy (Pt) values; Random Isonymy (Pr) values. The number of rows is equal to the total number of analysed (sub)populations. |
method |
character string specifying the method to be used in the calculation of the indexes. The available options are "A" and "B". Both the methods give similar results. The "B" method, being the most frequently used in the studies, is given as the default option |
Inbreeding coefficients are obtainable starting from Isonymy data. The data frame containing Observed and Random Isomymy for each (sub)population needed as argument in “sur.inbr” is easily obtainable using the "mar.iso" function. Inbreeding coefficients allow an estimate of the inbreeding level in a (sub)population on the basis of couples of surnames.
Returns a data frame reporting Total Inbreeding (Ft), Random Inbreeding (Fr) and Non-random Inbreeding (Fn) for each (sub)population (pop)
Total Inbreeding (Ft) is an estimate of the inbreeding level in a (sub)population. Random Inbreeding (Fr) is the expected inbreeding level in a (sub)population in case of completely random marriage unions. Non-random Inbreeding (Fn) expresses the deviance between Ft and Fr: positive Fn values show preference towards unions between consanguineous mates, negative Fn values show aversion towards unions between consanguineous mates.
Federico C. F. Calboli and Alessio Boattini [email protected]
Crow, J. F., Mange, A. P. 1965. Measurement of inbreeding from the frequency of marriages between persons of the same surnames. Eugen. Q. 12:199-203. Crow, J. F. 1980. The estimation of inbreeding from isonymy. Hum. Biol. 52:1-14.
sur.freq
to calculate surnames frequencies tables from raw marriages data bases, mar.iso
to calculate Observed and Random Isonymy coefficients starting from tables of couples of surnames frequencies, r.pairs
to calculate Repeated Pairs indexes
data(valley) valley # a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables calculated above the observed pairs of surnames in each population iso <- mar.iso(mar) iso # a data frame containing Pt and Pr values for each (sub)population inbreeding <- sur.inbr(iso) inbreeding # inbreeding indexes calculated using the method "B" inbreeding2 <- sur.inbr(iso,method="A") inbreeding2 # inbreeding indexes calculated using the method "A"
data(valley) valley # a subset of a real marriage data base mar <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF,freq.table="marriages") mar # frequency tables calculated above the observed pairs of surnames in each population iso <- mar.iso(mar) iso # a data frame containing Pt and Pr values for each (sub)population inbreeding <- sur.inbr(iso) inbreeding # inbreeding indexes calculated using the method "B" inbreeding2 <- sur.inbr(iso,method="A") inbreeding2 # inbreeding indexes calculated using the method "A"
A surname frequency table, with 3 populations and 5 surnames.
data(surnames)
data(surnames)
A 5 rows by 3 columns dataset
Surname frequency tables are the argument needed in all the surname-based inter-population analysis functions (e.g. "lasker", "hedrick", "uri", etc.). Surname frequency tables can be generated from marriage and marriage-like data (e.g. data that contain couples of related surnames) using the "sur.freq" function. In order to generate surname frequency tables from other surname sources (e.g. telephone directories, registers of voters, etc.) see the example in this help page. To import correctly surnames data bases in R see the "valley" dataset help page.
Alessio Boattini. Dummy dataset generated for testing and example purposes
data(surnames) # NB. How did we produce the "surnames" dataset? # the original data (an hypothetic list of surnames) # were arranged as: # YEAR POP SURNAME # 1901 3 FABBRI # 1901 3 VITALI # 1901 2 LIPPI # 1901 2 FABBRI # 1901 2 NARDI # 1901 2 NARDI # 1901 1 ANGELI # 1902 1 ANGELI # 1902 2 VITALI # 1902 2 LIPPI # 1902 1 LIPPI # 1902 1 LIPPI # 1902 3 VITALI # 1902 3 FABBRI # 1902 2 FABBRI # 1904 2 NARDI # 1904 2 NARDI # 1904 2 LIPPI # 1905 1 VITALI # 1905 1 FABBRI # 1905 3 FABBRI # 1905 3 ANGELI # 1905 2 LIPPI # 1905 2 NARDI # 1905 3 NARDI # 1905 3 NARDI # .......... # This arrangement does not necessarily reflect # the way other people would arrange their data. # The "surnames" dataset was generated using # the "table" function as follows: # table(data$SURNAME,data$POP)
data(surnames) # NB. How did we produce the "surnames" dataset? # the original data (an hypothetic list of surnames) # were arranged as: # YEAR POP SURNAME # 1901 3 FABBRI # 1901 3 VITALI # 1901 2 LIPPI # 1901 2 FABBRI # 1901 2 NARDI # 1901 2 NARDI # 1901 1 ANGELI # 1902 1 ANGELI # 1902 2 VITALI # 1902 2 LIPPI # 1902 1 LIPPI # 1902 1 LIPPI # 1902 3 VITALI # 1902 3 FABBRI # 1902 2 FABBRI # 1904 2 NARDI # 1904 2 NARDI # 1904 2 LIPPI # 1905 1 VITALI # 1905 1 FABBRI # 1905 3 FABBRI # 1905 3 ANGELI # 1905 2 LIPPI # 1905 2 NARDI # 1905 3 NARDI # 1905 3 NARDI # .......... # This arrangement does not necessarily reflect # the way other people would arrange their data. # The "surnames" dataset was generated using # the "table" function as follows: # table(data$SURNAME,data$POP)
Used to turn the asymmetric column stochastic matrix into a symmetric column stochastic matric.
sym.P(x)
sym.P(x)
x |
|
The function calculates the symmetric matrix from the asymmetric column stochastic matrix, leaving the diagonal unchanged and averaging m[i,j] and m[j,i] as (m[i,j]+m[j,i])/2. The computed average substitutes each pair of values in the new symmetric column stochastic matrix.
Returns a matrix.
...
Federico C. F. Calboli [email protected]
Jorde, L. B. 1982. The genetic structure of the Utah mormons: migration analysis. Human Biology 54(3): 583-597.
data(P) symmetric<-sym.P(P)
data(P) symmetric<-sym.P(P)
"uri" calculates the unbiased random isonymy coefficient starting from surname frequencies.
uri(x)
uri(x)
x |
is a surname frequency table where the N rows correspond to the surnames present in the whole population and the M columns are the (sub)populations |
The function implements Morton's isonymy method as outlined by Relethford. Unbiased estimations of intra-(sub)population isonymy were included. Unbiased Random Isonymy is an argument needed to calculate 'a priori' and conditional kinship matrices using the "rel.phi" and "rel.cond" functions.
Returns a square symmetric unbiased isonymy matrix.
The use of “uri” could be problematic, because different people are likely to arrange isonymy data in different ways on their computers. We decided for a matrix format for the isonymy data; the function would originally accept data in a different format and then convert it internally, but this would be a problem for people with data arranged in a different format. In the end we decided to write a specific function, "sur.freq", to generate surname frequency tables directly from raw marriage data or marriage-like data (the most commonly used sources in bio-demographic studies). For other types of surname data, see the verbose explanation in the info for the dataset "surnames" so it would be clear for the user how “uri” works.
Federico C. F. Calboli and Alessio Boattini [email protected]
Morton, N. E. 1973. Kinship bioassy. In: Genetic distance, J. F Crow and C Denniston (eds.). New York, Plenum Press, 97-104. Relethford, J. H. 1988. Estimation of kinship and genetic distance from surnames. Human Biology, 60(3): 475-492.
sur.freq
to generate the input surname frequency table from marriage data, surnames
for an explanation on how to generate the correct input table from other surname sources, lasker
for a similar kinship coefficient derived from surnames, hedrick
for a standardized kinship coefficient derived from surnames, rri
to calculate an unbiased estimate of Regional Random Isonymy from surnmaes rel.phi
to calculate an 'a priori' kinship matrix from isonymy data, rel.cond
to calculate a conditional kinship matrix from isonymy data
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # a unbiased random isonymy matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset # you can see that the surnames are arranged as the _rows_ # and the populations are the _columns_ # the function "uri" turns this data into a unbiased random isonymy matrix iso.matrix <- uri(surnames) iso.matrix
# starting from a raw marriage records dataset: data(valley) tot <- sur.freq(valley,valley$PAR,valley$SURM,valley$SURF) tot # a frequency table calculated above all the surnames iso.matrix <- uri(tot) iso.matrix # a unbiased random isonymy matrix #starting from a generic surname frequency table data(surnames) surnames # a made-up dataset # you can see that the surnames are arranged as the _rows_ # and the populations are the _columns_ # the function "uri" turns this data into a unbiased random isonymy matrix iso.matrix <- uri(surnames) iso.matrix
A raw marriage data set, in which every row corresponds to a different marriage record.
data(valley)
data(valley)
A 702 rows by 8 columns dataset. The columns of "valley" contain the following information:
PAR: | (sub)population to which the marriage is referred. |
YEAR: | year in which the marriage was performed. |
SURM: | male surname. |
NM: | male birth (sub)population. |
RM: | male residence (sub)population. |
SURF: | female surname. |
NF: | female birth (sub)population. |
RF: | female residence (sub)population. |
For all columns the letter "X" indicates that the mate was born or resident outside of the study area.
Marriage data, depending from the used sources, may contain more or less information than the "valley" example data set. Columns order in the dataset is not relevant. Information on (sub)population, male and female surnames are needed to perform surname-based analyses on marriage data. NB. Information on mates birthplace (or equivalent data) can be used to produce a migration matrix (see the "raw.mig" dataset). NB2. Given that surnames may contain spaces (e.g. "DE IORIO"), the best way to import surname data is to save the original data base as a .csv file, and then use the read.csv() or read.csv2() functions. Another option is to use GNUMERIC, because it has a text export feature that allows to put brakets ("") to the left and right of every cell content, so that composite surnames are read as a string. The resulting text file is easily imported by read.table().
Paola Gueresi. Subset of a real marriage dataset
data(valley)
data(valley)