Inference in infectious disease systems practicals
1 Introduction
The purpose of this document is to run some ABC-SMC routines using the (current) SimBIID
package (Simulation-Based Inference for Infectious Disease models), which is a package designed mostly for teaching purposes, to illustrate some of the ideas and routines discussed in the lectures.
Please note that there is a PDF version of these notes available by clicking on the icon in the top-left corner of the page.
I am heavily indebted to Stefan Widgren for his fantastic SimInf package, and indeed (with permission) I shamelessly borrow some of his ideas (such as the mparse()
function).
In the longer term, it would be good to have a fully SimInf
supported version of this package, but for the time being I have developed SimBIID
mostly as a tool to help you learn how ABC-SMC (and other routines) work, and to allow you to get your hands dirty with some real models/data, but without the complexities of having to code these routines up yourselves.
There are various general-purpose software packages for performing Bayesian inference for general classes of models. For example,
For general-purpose Bayesian modelling, with tractable likelihood functions, then I would recommend any of these packages. However, the problem from the infectious disease modelling perspective is that it is challenging to implement the kinds of models that we use in this course, particularly when we have missing or incomplete data, and non-standard likelihoods. An R package that is designed for modelling the kinds of system we focus on in this course is pomp
(standing for Partially Observed Markov Processes). This a powerful package that can be used to implement various “plug-and-play” inference algorithms, notably the maximum likelihood via iterated filtering (MIF) approach of Ionides, Bretó, and King (2006), and a particle Markov chain Monte Carlo algorithm (Andrieu, Doucet, and Holenstein 2010) based on the bootstrap particle filter (Gordon, Salmond, and Smith 1993). It also includes some simple Approximate Bayesian Computation rejection algorithms, but not the more powerful ABC-SMC approach of Toni et al. (2009). pomp
also provides functions to efficiently simulate from state-space models. The interface is more complex than SimInf
and SimBIID
(it involves you knowing some small amount of C), and so for the purposes of this course the SimBIID
package will serve as an entry point into these algorithms. There are also some general purpose ABC packages in R, such as the EasyABC
package.
The pomp
website: https://kingaa.github.io/pomp/—on Aaron King’s GitHub page—contains lots of tutorials for those of you who wish to pursue any of these ideas further. The influenza in a boarding school data (Anonymous 1978) that we will use as a case study can be found in the outbreaks
R package, attributed to de Vries et al. (2006), although variations of the data can be found in other literature (e.g. Murray 2003) and is also used in some of the pomp
tutorials. The Abakaliki smallpox data can also be found in the outbreaks
package, as well as in other places in the literature (e.g. O’Neill and Roberts 1999; McKinley et al. 2014).
The workhorse of SimBIID
is the function mparseRcpp()
, which apes the syntax of mparse()
, but allows for different compilation and output options. It does not have the full power of SimInf
, so currently only works on single node simulations (no networks), with some limited additional structures. It also implements a simple Gillespie algorithm, rather than the more sophisticated approaches used in SimInf
and pomp
, but it’s fast enough for the examples here. It also allows for other things such as the inclusion of early stopping criteria, which often greatly help to improve efficiency in say ABC-SMC routines (particularly in poorly supported regions of the parameter space).
1.1 Installation
The package depends on the Rcpp
and RcppArmadillo
packages, which require the installation of the correct C++ compilers. The guidance below is taken from Sections 2.1.1, 2.1.2 or 2.1.3 here.
1.1.1 Windows
Install Rtools.
(Make sure you tick the option to add Rtools to the PATH whilst installing.)
1.1.2 Mac
Install Xcode command line tools. Execute the command xcode-select --install
in a Terminal.
You might also need to install the gfortran libraries from:
https://cran.r-project.org/bin/macosx/tools/gfortran-6.1.pkg
1.1.3 Linux
Install gcc and related packages (you might also need gcc-fortran
for some of the dependencies).
In Ubuntu Linux, execute the command sudo apt-get install r-base-dev
in a Terminal.
1.1.4 Install package
Once the compilers have been installed, then the CRAN version can be installed in the usual way, e.g.
install.packages("SimBIID")
Alternatively, the development version can be installed from source using the remotes
package in R. That is, install the remotes
package and then run:
library(remotes)
install_github("tjmckinley/SimBIID")
References
Andrieu, Christophe, Arnaud Doucet, and Roman Holenstein. 2010. “Particle Markov Chain Monte Carlo Methods.” Journal of the Royal Statistical Society, Series B (Methodological) 72 (3): 269–342.
Anonymous. 1978. “Influenza in a Boarding School.” British Medical Journal 1: 578.
de Vries, Gerda, Thomas Hillen, Mark Lewis, Johannes Mueller, and Birgitt Schöenfisch. 2006. A Course in Mathematical Biology: Quantitative Modeling with Mathematical and Computational Methods. Society for Industrial; Applied Mathematics.
Gordon, N. J., D. J. Salmond, and A. F. M. Smith. 1993. “Novel Approach to Nonlinear/Non-Gaussian Bayesian State Estimation.” Radar and Signal Processing, IEE Proceedings F. 140 (2): 107–13. https://doi.org/10.1049/ip-f-2.1993.0015.
Ionides, E.L., C. Bretó, and A.A. King. 2006. “Inference for Nonlinear Dynamical Systems.” Proceedings of the National Academy of Sciences USA 103: 18438–43.
McKinley, Trevelyan J., Joshua V. Ross, Rob Deardon, and Alex R. Cook. 2014. “Simulation-Based Bayesian Inference for Epidemic Models.” Computational Statistics and Data Analysis 71: 434–47.
Murray, J.D. 2003. AMathematical Biology I - an Introduction. 3rd ed. Springer.
O’Neill, Philip D., and Gareth O. Roberts. 1999. “Bayesian Inference for Partially Observed Stochastic Epidemics.” Journal of the Royal Statistical Society. Series A (General) 162: 121–29.
Toni, Tina, David Welch, Natalja Strelkowa, Andreas Ipsen, and Michael P.H. Strumpf. 2009. “Approximate Bayesian Computation Scheme for Parameter Inference and Model Selection in Dynamical Systems.” Journal of the Royal Society Interface 6: 187–202.