Digital Institute

VenusC - QSAR

VenusC - QSAR

In the search for new anti-cancer therapies, the family of kinase enzymes are important biological targets since many are intimately connected to cell division and other important maintenance functions.

The scientists use a method known as Quantitative Structure-Activity Relationships (QSAR) to mine experimental data for patterns that relate the chemical structure of a drug to its kinase activity.

Given a representation of a chemical structure: 

  • COc1ccc(cc1)c2nc(SCCCCCn3c(nc4ccccc34)C(C)C)[nH]c2c5ccc(OC)cc5
  • CCCCN(CCCCCSc1nc(c2ccc(OC)cc2)c([nH]1)c3ccc(OC)cc3)c4oc5cc(c(Cl)cc5n4)[N+](=O)[O-]
  • COc1ccc(cc1)c2nc(SCC(C)(CSc3oc4ccccc4n3)c5ccccc5)[nH]c2c6ccc(OC)cc6

a set of descriptors can be calculated, which attach certain quantifiable metrics to this structure.

These metrics can then be used to build a regression type model linking structure to an activity measure. If a successful QSAR model can be derived from the experimental data then that model can be used to focus new chemical synthesis. By creating QSAR models for more than one set of results, for different kinases, the new drugs can be designed to be selective. The process for developing QSAR models is computationally intensive involving the calculation of descriptors, generation of multiple candidate models and selection of the best models based on their predictive performance when applied to unseen data:

VENUS-C was an EU funded project concerned with providing a framework to allow scientific applications to be portable between different computing environments and providers. The projects aimed to allow code to be portable between cloud computing environments (Windows Azure, Open Nebula) and traditional supercomputing environments (Barcelona Supercomputer, KTH).  As such, the project will build a framework that abstracted away the details of the platform on which the code was running.

A key part of this project was the demonstration that the platform developed was applicable to a range of application domains. The Digital Institute was responsible for running the QSAR modelling task described above using an instance of e-Science Central hosted within the VenusC cloud platform. A key output from this task was the publishing of over 1M QSAR regression models. This allowed researchers to obtain activity predictions for new compounds.

Another key outcome was a significant expansion in the capabilities of e-Science Central, which allowed us to scale the platform to approximately 200 Cloud servers and model a large public dataset (ChemBL) in its entirety within four days.

Scale-up Process image
Scale-up chart
QSAR Workflow
Click on diagram to view larger version