Model Development & Training.ipynb

Model Development & Training.ipynb — Building the predictor

Objective

Train an explainable multimodal network that fuses gene-expression and chemical fingerprints to classify drug–cell-line pairs as sensitive or not sensitive.


Inputs

File
Drive path
Notes

multimodal_dataset_final.pkl

processed_datasets/

108,696 rows × 2,783 features (735 genes + 2,048 bits)

DrugSens-Train.csv

…/sensitivity/pivot/clas/

Training labels

DrugSens-Validhyper-Subsampling.csv

same

Early-stopping / HP tuning

DrugSens-Trainhyper-Subsampling.csv

same

Class-weight estimation

DrugSens-Test.csv

same

Held-out evaluation


Architecture

  • Gene branch: 735 gene expression features are reduced to 128 dimensions via a dense encoder.

  • Chemistry branch: 2,048-bit Morgan fingerprints are reduced to 128 dimensions via a dense encoder.

  • Cross-modal attention: 8-head cross-modal attention fuses the gene and chemical feature representations, allowing interaction between modalities.

  • Fusion and output: The joint representation is concatenated and passed through three fully-connected layers (512 → 256 → 64), ending with a sigmoid output for binary classification (sensitive vs not sensitive).


Training setup

Setting
Value

Hardware

NVIDIA A100 (Colab)

Optimiser

AdamW, lr = 1 × 10⁻³

Batch size

512

Loss

Weighted binary cross-entropy (positive weight ≈ 8.5)

Precision

Mixed float16 / float32

Early stopping

Patience = 10 on val-AUROC


Outputs

Artifact
Purpose

best_multimodal_model.keras

Final weights (best val-AUROC)

logs/train/, logs/validation/

TensorBoard event files

Best epoch (test set) — AUROC = 0.981, Precision = 0.973, Recall = 0.985


Rationale

  • Attention maps provide gene–drug interaction insights for each prediction.

  • Class weighting handles the 8.5 : 1 imbalance without down-sampling.

  • Mixed precision halves GPU memory usage and speeds training ≈ 1.7 × on an A100.

The model file and logs live in the shared Google Drive; load them in Colab to reproduce or fine-tune.

Last updated