logo

Geometric Deep Learning for Drug–Target Interaction — Nucleate BioHack 2025

Built by Jeremy Wayland during the Nucleate BioHack (Novartis Challenge). This post outlines the dataset and paper we followed, our graph neural network approach, how we integrated geometric and molecular features, links to the code, and an interactive compound visualization.

Paper and Dataset — Context

  • We drew on public DTI benchmarks and assay data commonly used in drug discovery workflows. The setup centered on predicting small‑molecule binding likelihood against target proteins.
  • Molecules were represented from SMILES/SDF; proteins via sequences/structures where available. We organized examples as compound–target pairs with labels or regression scores.
  • References and dataset pointers are provided in Resources below; update these to your exact paper/dataset.

Model — GNN Architecture

  • Implemented a message‑passing GNN over molecular graphs (atoms as nodes, bonds as edges). Examples included GIN‑style updates with learned aggregations and optional attention.
  • Targets were encoded via sequence embeddings or structure‑aware features; the model fused compound and target representations before prediction.

Features — Geometric + Molecular

  • Molecular: atom types, formal charge, aromaticity, hybridization, ring membership, bond order, and optional Morgan fingerprints.
  • Geometric: 3D coordinates from conformers enabled distance/ angle/dihedral features and radial basis expansions to inform message passing.
  • Protein: sequence embeddings (e.g., transformer‑based) and optional residue‑level structure descriptors when available.

Code and Reproducibility

  • We provide links to the source code and experiment configs in the Resources section. Swap in your repository URL.
  • The pipeline includes data preprocessing, feature generation, model training/evaluation, and inference scripts for batch predictions.