Institute of Health & Society

DataSHIELD Development

DataSHIELD Development

What?

DataSHIELD (http://www.datashield.ac.uk/) is a novel software solution addressing some of the basic challenges hindering access of researchers and care professionals to individual level health, biomedical and social data.

DataSHIELD development has been funded by grants from the EU, MRC, WT, JISC, the Software Sustainability Institute and WUN (http://www.datashield.ac.uk/grants/). Ongoing development is currently funded under the Connected Health Cities (North East and North Cumbria) project via a grant commissioned by the Northern Health Science Alliance from the Department of Health. Implementation of DataSHIELD under CHC is a key element of construction of the Great North Care Record (GNCR) and will provide enhanced security for potentially sensitive data.

Why?

Research in modern biomedicine and social science is increasingly dependent on the analysis and interpretation of microdata (data on individual subjects) or on the co-analysis of such data from several studies simultaneously. Making individual-level data available so that it may be queried by researchers – or other professional users – raises important ethicolegal questions and can be controversial. Given this backdrop, DataSHIELD facilitates important research in settings where:

  • a co-analysis of individual-level data from several studies is needed but governance restrictions prevent or hinder the release or sharing of some of the required data
  • equivalent governance concerns prevent or hinder access to a single data set
  • a research group wishes to actively share the information held in its data with others but does not wish to cede control of the governance of those data and/or the intellectual property they represent by physically handing over the data themselves
  • a data set which is to be remotely analysed – or included in a multi-study co-analysis – contains data objects (e.g. images) too large to be physically transferred to the analysis site.

How?

DataSHIELD is implemented via free open source software (https://github.com/datashield). At heart, it involves a modified R statistical environment linked to an Opal database deployed behind the firewall at each data-holding organisation. Multiple data sets are analysed simultaneously but in parallel, linked by non-disclosive summary statistics. In effect, “analysis is taken to the data not data to the analysis” via secure web services with commands issued through a standard R environment on a conventional computer that can be located anywhere in the world. When the DataSHIELD infrastructure and approach are used with just one data source it is referred to as “single site DataSHIELD” which provides a freeware-based approach to creating a secure data enclave

das_Datashield