BBSRC/EPSRC Bioinformatics
initiative

Grant #: BIO14454 , March 2001-March 2004

- Richard Boys ( Statistics )
- Tom Kirkwood ( Institute for Ageing and Health )
- Darren Wilkinson ( Statistics )
- Wan Ng (postdoc)

**Biology: ** The project aims to gain an improved
understanding of stochastic kinetic genetic regulatory transcription control
mechanisms in eukaryotic cells, by developing software and algorithms to aid
investigation. There is strong evidence for intrinsic stochastic variations
in gene expression that can have a variety of effects on cell fate and
function. Stochastic models will be developed for eukaryotic gene regulation
which capture the key feedback mechanisms for expression.

**Statistics: ** The project will be concerned with the
development of computer-intensive algorithms for Bayesian inference for the
parameters and structure of the highly complex continuous-time Markov
processes that are used to model the bio-chemical networks which regulate
protein synthesis in eukaryotic cells.

Stochastic models will be developed for eukaryotic gene regulation which capture the key feedback mechanisms for expression. Computer-intensive statistical methods will be developed so that inferences can be made from experimental data for both model parameters and model structure. Freely available software will be produced, both for the modelling and simulation of eukaryotic genetic regulatory circuits and for inferring parameters of such networks based on real-time imaging data.

Several papers have been written resulting from the research carried out as part of this project. These are listed here, and links to PDF versions will be maintained until they appear in print.

- Boys, Wilkinson, Kirkwood (2004) Bayesian inference for a discretely observed stochastic kinetic model , in submission.
- Wilkinson, D.J. & Boys, R.J. (2004). Bayesian inference for stochastic kinetic genetic regulatory networks. In R.G. Aykroyd, S. Barber, & K.V. Mardia (Eds.), Bioinformatics, Images, and Wavelets, pp. 29-32. Department of Statistics, University of Leeds.

The following work was not part of the BBSRC-funded project, but the result
of a "spin-off" project sponsored by an EPSRC studentship, which is
examining, *inter alia *, the use of diffusion approximations for
inferential purposes.

- Golightly, Wilkinson (2005) Bayesian inference for stochastic kinetic models using a diffusion approximation ,
*Biometrics*.

This project was more concerned with proving that inference in stochastic kinetic models from (discrete) time course data is possible, rather than with any particular real regulatory networks. Here we provide links to some very simple example models which can be used in conjunction with the simulation and inference software we provide below. The model format used is a subset of SBML Level 1. It is assumed that all rate laws are stochastic and that all units are self-consistent. The units , compartments and rules sections of the SBML document are not processed. ie. there is only one compartment and rules are not supported. Look at the example models for further details. All of the models and software are free in the sense of the GNU General Public License.

- models.tgz - models in a gzipped tar file. Can be unpacked on a Linux system with a command like tar xvfz models.tgz . See the enclosed README.txt for further details.

Again, the software for inference that we provide is intended more as proof-of-concept than as a practical application for applied bioinformatics researchers. All of the models and software are free in the sense of the GNU General Public License. Presently there are two packages that we provide.

- gillespie is a simple SBML Level 1 simulator designed to simulate exact realisations from models defined in the above manner. This is useful for simulating test data for the inference software. This too is packaged as a gzipped tar file. See the enclosed README.txt for information on compilation and execution.
- stochInf is the inference program, which accepts models, as described above, together with data (in the format produced by gillespie ), in order to carry out parameter inference. The parameters in the SBML model file are used only as starting values for the MCMC scheme and are not used otherwise.