BioChemGRAPH project will improve data synergy to facilitate drug development

Back To Discover

Written by


Posted on

July 16, 2020

The BioChemGRAPH collaboration brings together key chemical and biochemical datasets, to give researchers deeper insights than ever before.

A collaboration announced today will see the integration of key structural, functional, and biochemical data across both small molecules and macromolecules. This will allow researchers to quickly access relevant information from trusted but disparate datasets, advancing work in fields such as target validation, drug development, drug repurposing and cross-reactivity.



The Aim

The BioChemGRAPH project will create an easily accessible web platform to bring together datasets which are instrumental in many areas of research.


While vast and highly curated databases of quality chemical and biochemical data exist, interpreting them together is difficult. For each small molecule there are a huge variety of experimentally determined and calculated properties which can inform research. The varied nature of these data means that separate databases are used to collect and manage information, each specializing in a particular area.


Bringing together these different approaches will support both basic and translational research, to better answer questions like; how does this target behave? Where can this drug be repurposed? Or what potential side-effects could it have?


The Partners

The project will see PDBe, ChEMBL and CCDC partner to aggregate data on small molecules and related macromolecules together into the existing PDBe-KB platform.


The three databases are seen as leaders in their respective fields, known and trusted for their high standards in data management. This means the final BioChemGRAPH platform will have a solid foundation, with quality at its core.


Table shows the datasets which will be connected by the joint BioChemGRAPH project

Database Managed by Type Size
PDBe - Protein Data Bank in Europe EMBL-EBI - European Molecular Biology Laboratory's European Bioinformatics Institute Biological macromolecular structures, protein structures 174,448 deposited to PDB network since 2000
ChEMBL EMBL-EBI Small molecules, 2D structures, calculated properties and abstracted bioactivities. 1,961,462 distinct compounds
CSD - Cambridge Structural Database CCDC - Cambridge Crystallographic Data Centre Small molecule. 3D crystal structures of organic and metal-organic compounds. 1,064,756 structures


    The technology will build on the existing community-driven PDBe-KB platform, which brings together structural and functional annotations for macromolecules in the PDB.


The Journey

Following a funding award from the BBSRC’s Bioinformatics and Biological Resources Fund, work will begin this year.


“We’re really excited to begin this project.” said Ilenia Giangreco, Discovery Science Team Leader at CCDC. “Curating high quality data and building great tools to interpret it are our passion at CCDC, so joining forces to bring scientists even more information, in an even easier format is a great opportunity. This could open possibilities across so many areas of research.”


“We’ve seen a clear need in the research community for integration of structure and chemistry data” explains Sameer Velankar, PDBe Team Leader at EMBL-EBI. “We’re hoping that this collaboration will allow researchers to get comprehensive answers to a wide range of research questions by opening up protein structure and chemical data to facilitate drug development.” 


To stay updated on this ongoing project, follow us on Twitter or LinkedIn to hear our news first.


CSD (105)

CSD Database (24)

Data (18)

News (11)

Partnerships (6)

Protein Data Bank (4)