How to Mine Protein Structure and Small Molecule Data in One Place
May 26, 2023
The New Macromolecule Hub
Drug discovery and design requires two essential parts: the target and ligand. This fundamental “lock and key” is what makes medicines work, yet data on proteins and small molecules have always existed in separate systems, requiring separate software tools to search and use that information. That is, until now.
It is well established that data on existing molecules can help inform and guide the design and understanding of novel configurations of matter. Informatics principles are used across the drug design journey, from early target validation, hit-to-lead, right through to formulation. Access to relevant structural data is essential at each stage.
We curate and distribute the world’s database of small-molecule organic crystal structures, the Cambridge Structural Database (CSD). Founded in 1965, it is used by tens of thousands of scientists globally to inform and guide their work across drug, agrochemical, and materials design. The Protein Data Bank (PDB) is the equivalent for macromolecular structures, including proteins, antibodies, DNA, and RNA.
This separation is excellent for management and curation; each organization has its own specialists and infrastructure to suit the data type they deal in. However, for scientists needing the full picture of protein and ligand, it means learning how to access, use, and search two systems.
“The FAIR data principles guide that data should be findable, accessible, interoperable, and reusable. Accessibility and interoperability is key here; we want scientists to be able to easily access everything they need in one place in a common way, to remove a logistical barrier that can slow R&D. The Macromolecule Hub removes barriers, by allowing users to handle proteins and small molecules together in a unified manner.”Carmen Nitsche, General Manager at CCDC Inc.
A Single, Secure, Online Platform: Macromolecule Hub
The newly launched Macromolecule Hub aims to bridge this gap, allowing scientists to search and mine the world’s protein binding and small molecule structural data in one place.
Medicinal Chemists can use this tool to better understand, and suggest modifications to, their protein-ligand pair of interest.
For example, after running in silico protein-ligand docking you could look for similar interactions. What functional groups tend to exist in a pocket like this? What conformation do ligands like this usually adopt? Building a picture from the existing, known structures in the PDB and CSD allows you to then suggest modifications and improvements to the ligand. The Macromolecule Hub offers:
- Web-based access (no installation required)
- Secure setup behind your company firewall, to keep research private
- Ability to search and mine all ligand-bound protein binding sites from the PDB
- Ability to search and mine all small-molecule organic and metal-organic structures in the CSD
- Advanced searching with 2D or 3D search options (for example: search by a sketched-out molecule and constrain to specific planes, centroids, or vectors)
- Quick sharing of results: just send a link to your colleague to show your findings.
Other Ways to Mine across Datasets
The CCDC data mining line-up includes CSD-CrossMiner, a desktop application that allows CSD-PDB data mining by pharmacophore search.
The Macromolecule Hub allows data mining by specific 2D and 3D structure search, so users now have both options available to suit their needs.
- The Macromolecule Hub is available now – learn more here
- Ask for an online demo by contacting the CCDC team here
- Explore other CCDC software here.
Drug Development (35)
Drug Discovery (50)
Pharmaceutical Discovery (31)
Protein Data Bank (5)