Glaser Seminar 2014

Distinguished Glaser Seminar 2014


January 27 – 31, 2014, noon to 1:00 PM in room WC130 on the FIU Modesto A. Maidique Campus (MMC), simultaneously broadcast to room MSB105 on the Biscayne Bay Campus (BBC). [January 30 the presentation will be broadcast from BBC to MMC.] There will be time for discussion following each presentation.

Dr. David R. Anderson
Emeritus Professor, Department of Fish, Wildlife, and Conservation Biology
Colorado State University

The concept of “information” can be quantified and this has lead to many important advances in the analysis of data in the empirical sciences. This seminar series focuses on a science philosophy based on “multiple working hypotheses” and statistical models to represent them. The fundamental science question relates to the empirical evidence for hypotheses in this set – a formal strength of evidence. Kullback-Leibler information is the information lost when a model is used to approximate full reality. In the early 1970s Hirotugu Akaike found a link between K-L information (a cornerstone of information theory) and the maximized log-likelihood (a cornerstone of mathematical statistics). This combination has become the basis for an entire new paradigm in model based inference. This seminar series concludes with ways to make improved inferences based on all the models in a carefully derived set: multimodel inference.

The information-theoretic paradigm represents a compelling approach to rank science hypotheses and their models. Simple methods are then introduced for computing the likelihood of a model, given the data; the probability of a model, given the data; and evidence ratios. These quantities represent a formal strength of evidence and are easy to compute and understand, given the estimated parameters and associated quantities that are provided by statistical analysis software.

Monday, January 27 Alternative Science Hypotheses and the Model Selection Problem

The seminar series begins with Thomas Chamberlin’s (1890) principle of Multiple Working Hypotheses and the need for mathematical models to represent them. Ideally, there is an exact mapping between alternative hypothesis j and its model, j. The set of hypotheses to be evaluated come from considerable “hard thinking” and everything to follow is conditional on this set. Given, several hypotheses of interest, it seems compelling to ask which one is best supported by empirical data. What is meant by “best” and how is this quantified? This is the “model selection problem” – it is central to empirical science. Kullback-Leibler (K-L) information is the target of model selection and leads to a rigorous definition of formal evidence in science. Fundamental here is the value of the maximized log-likelihood function: max log((L(θ)|x,g)), where θ is a vector of unknown parameters, x are the data and g is the model. Examples using data on the transmission of Tb in ferrets in New Zealand and on beak size in Darwin’s finches in the Galapagos Islands are used to aid understanding.

Tuesday, January 28 Quantifying Information Loss

A glimpse of the derivation from K-L information to Akaike’s AIC. This derivation makes clear the relationship between information theory (i.e., K-L information) and likelihood (the backbone of mathematical statistical theory). A stunning relationship is that the negative of K-L information is Boltzmann’s entropy; the crowning achievement of 19th century science. Expected K-L information and small sample AIC are developed. A simple transformation of AIC values yields ∆ values that are pivotal and lead to simple calculations to obtain quantitative measures of information loss and formal evidence. The ∆ values are on a scale of information loss. Examples include a dose-response study of flour beetles.

Wednesday, January 29 Information-Theoretic Approaches

The Principle of Parsimony underlies all model selection approaches. The concept of ∆ values are pivotal and lead to simple calculations to obtain the likelihood of model i L(gi|X); the probability of model i, Prob{gi|X}; and evidence ratios, Ei,j. t-tests and ANOVAs are cast into an information-theoretic framework. Overdispersion, cross validation, likelihood equivalent to R2. We are not trying to “model the data” but instead, model the information in the data. The model set evolves leading to fast-paced empirical science. Examples include landscape data on several species of birds, cement hardening, frog mass, tobacco lesions, European dipper populations and incubation period in Creutzfeldt-Jokob disease.

Thursday, January 30 Historical Statistics and Protoplasm

Null hypothesis testing, test statistics and their null distributions, arbitrary α-levels, P-values, and rulings of “statistical significance” (or not). These approaches are not evidential and misuse is rampant. These approaches are now 70 to 100 years old and now “old school” in very important ways. Biologists do not teach courses in protoplasm; why should statisticians teach primarily null hypothesis testing methods. These historical methods should not be used in serious work. Examples include a huge ($12M) clinical trail to look at efficacy of Glucosamine and Chondroitin for treatment of knee pain.

Friday, January 31 Multimodel Inference (MMI) and Summary Comments

MMI is about making formal statistical inferences using all (or some) of the models in an a priori set. Model averaging (there are 3 flavors here), model selection uncertainty, parsimony, relative importance of variables, and confidence sets on models constitute the current state of MMI. The concept of model selection bias will be introduced. Strategies will be suggested to avoid the finding of “effects” that are actually spurious. 21st century science will be about multimodel inference using I-T methods. Examples include a dose response study of flour beetles, cement hardening, a storm data from Durban, South Africa. The seminar series will conclude with some final summary points.