Digital Institute

ReComp

ReComp

This project is concerned with the need and opportunities for selective recomputation of resource-intensive analytical workloads.

Background

As the cost of allocating computing resources to data-intensive tasks decreases, large-scale data analytics becomes more affordable. The vast amount of data is continuously providing new insights

Predictive models that encode knowledge from data are increasingly used to drive decisions in a broad range of areas. These range from science and public policy to marketing and business strategy. The process of learning such actionable knowledge relies upon information assets, including the:

  • data itself
  • know-how that is encoded in the analytical processes and algorithms
  • additional background and prior knowledge  

Since these assets continuously change and evolve the models may become obsolete over time. This will lead to poor decisions in the future unless the assets are periodically updated.  

EPSRC logo

Focus of the project

This project looks at the need and opportunities for recomputation of analytical workloads.

The decision on how to respond to changes in these information assets requires striking a balance. This is between the estimated cost of recomputing the model and the expected benefits of doing so.

In some cases new medical knowledge may invalidate a large number of past cases eg when using predictive models to diagnose a patient's genetic disease. On the other hand, such changes in knowledge may be marginal or even irrelevant for some of the cases. It's therefore important to be able to:

  • determine which past results may potentially benefit from recomputation
  • determine whether it is technically possible to reproduce an old computation
  • assess the costs and relative benefits associated with the recomputation

The project investigates the following hypothesis:

Based on these determinations, and given a budget for allocating computing resources, it should be possible to accurately identify and prioritise analytical tasks that should be considered for recomputation.

Technical approach

Our approach considers three types of meta-knowledge that are associated with analytics tasks:

  • knowledge of the history of past results (ie provenance metadata that describes which assets were used) and how
  • knowledge of the technical reproducibility of the tasks
  • cost/benefit estimation models

The first element is required to determine which prior outcomes may potentially benefit from changes in information assets. The second element, reproducibility analysis, is required to determine whether an old analytical task is still functional. It also determines whether the task can actually be performed again, possibly with new components and on newer input data.