Development of Statistical Methods for Population Genomic Data
My main research focuses on building models for statistical inference of the processes underlying genetic variation in large datasets of closely linked genetic markers. Publicly accessible datasets are now available that give detailed pictures of haplotype diversity from a sample of human populations. Genomic data of this form is likely to be increasingly important studying the genetic of disease: successful use of this data requires new statistical techniques. My main interest is in how we use known human evolutionary history to inform genetic studies of common human disease – such as type 1 and type II diabetes, hypertension and coronary artery disease.
I am developing models that describe genomic data and can be used to estimate genetic parameters from subdivided populations, and, more importantly, can sample from the conditional distribution of an unseen variant, conditional on genomic variation. A large number of population genomic problems can be described under this prediction-with-subdivision framework and problems of immediate interest can be made to conform such as fine-scale mapping for case-control studies of genetic variation, and the search for loci that have undergone different selective regimes in sampled subpopulations.
Recent work has focussed on methods for inferring human evolutionary history from worldwide DNA samples. Projects have concentrated on the X and Y chromosomes and mitochondrial DNA. I am the author of BATWING, which has been used extensively in inferring human history using the Y chromosome.