More comments on Information Criterion
(Posted 12 April 1999)
Background: Page 363 discusses Akaike's information content, a measure to compare the fit of different models by adjusting the likelihood for the number of parameters fit. The likelihood ratio test does not penalize for number of parameters that are
included in the model, while most of the information criteria do. Additional IC approachs were briefly discussed by Cuchan Wang (Pfizer) on the Animal Breeders Discussion group
In
general, an information criterion (IC) is defined:
IC = L - P
where L is the maximum (restricted) likelihood value and P is a penalty
term, which is a function of number of parameters in the model. The more
parameters in the model, the bigger the penalty. A full model can have a
larger or a smaller IC in comparison to a sub-model or another model, thus a
tool for model selection. It should be noted that model selection based on
IC does not require model nesting. Various definitions of the penalty term
(P) result in different ICs, such as Akaike IC, Hannan and Quinn IC and
Bayesian IC. For each IC, there also exists variation in defining P. The
bigger the IC defined as above, the better the model fit. IC used as a model
selection criterion is large sample based.
A few references:
Awad AM (1996) Properties of the Akaike information criterion.
Microelectron. Reliab. 36: 457-464.
Bozdogan H (1987) Model selection and Akaike's information criterion (AIC):
the general theory and its analytical extensions. Psychometrika 52: 345-370
Hannan EJ and Quinn BG (1979) The determination of the order of an
autoregression. J Royal Statistical Society, Series B, 41: 190-195.
Schwarz G (1978) Estimating the dimension of a model. Annals of Statistics
6: 461-464.
Programs
-
Pedigree Analysis Package (PAP) is a set of FORTRAN 77 programs by Sandra Hasstedt (Department of Human Genetics, University of Utah)
for computing likelihoods and simulating phenotypes of major-locus genetic models on pedigrees.
-
Mendel and Fisher are a suite of programs by Kenneth Lange (Dept. of Biostatistics, University of Michigan) for dealing with a variey of problems in human population genetics, such as complex segregation analysis. Full details can be found in Lange, Weeks, and Boehnke (1988, Genetic Epidemiology 5: 471-472) with reviews of these progams by
Hopper (1988 , Genet. Epid. 5: 473-476) and
Marazita (1988, Genet. Epid. 5: 473-476 ). Currently, these programs can be obtained from Lange via snail-mail.
-
Statistical Analysis for Genetic Epidemiology (S.A.G.E.) is a software package containing more than 20 programs for use in genetic analysis of
family and pedigree data. The software is available for several platforms.
Here are some of the programs:
-
AGEON : Estimates the Distribution of Age-of-Onset - in the presence of non-susceptible persons. This information can be used in SIBPAL.
- ASSOC: Marker-Trait Association - allows for estimating and testing the association between a quantitative trait and a genetic marker in pedigree data.
- BCROSS: Genetic Hypothesis Testing from Data on Inbred Strains, their F1 and Backcross(es) - test one locus, two-locus and polygenic hypotheses for quantitative data.
- CLUSTR: Power Transformation to Obtain Normality and Homoscedasticity from Clustered Data - can be used to obtain an appropriate transformation for data that are to undergo analysis with BCROSS.
- DESPAIR: Design of linkage studies that are based on affected pairs of relatives - determines the optimal two-stage study design for such studies.
- FCOR : Familial Correlations - estimates familial correlations for all types of relatives up to third degree, including cross-correlations for up to five traits.
- FSP: Family Structure Program - used to check pedigree data for common structural errors as well as consanguineous
matings and loops,
- LODLINK: Lod Score Linkage Analysis - which performs two-point linkage analysis between a trait and each of a set of markers.
- MAPLOC: Mapping a Disease-Related Trait Relative to a Set of Linked Markers - assumes a fixed map for a set of markers to
find the best relative position for a trait locus.
- REGC, REGD, REGTL,
REGTN: Segregation Analysis Programs - reforms analyses based on regressive models. Information from these programs can
be further processed by LODLINK.
- RELATE : Relationship to Proband - determines, for single proband pedigrees, the relationship of each individual in the
pedigree to the proband.
- RELPAL: Relative Pair Linkage Analysis - which screens for genetic linkage of a continuous trait to markers on the basis of
relative pair relationships.
- SIBPAL : Sib-Pair Linkage Analysis - which screens for genetic linkage on the basis of sib-pair relationships. It can also be
used for ordering marker loci.
- TDTEX : Exact Test for Transmission Disequilibrium - implements several asymptotic and exact versions of the transmission
disequilibrium test (TDT).
Home Pages:
[ Volume One ] -
[ Volume Two ] -
[ What's new ] -
[ Book]
Created 25 February 1995, last updated 30 Jan 1996
Bruce Walsh. jbwalsh@u.arizona.edu .
Comments welcome.