Folder Structure

GDSC2_CCLE-GeneExp/
├── CCLE-DepMap22Q2_geneexp/
│   ├── CCLE_expression.csv
│   ├── CCLE_expression_cleaned.csv
│   ├── CancerGeneCensus_GRCh38_COSMIC_v101.csv
│   ├── Cell_lines_annotations_20181226.txt
│   ├── sample_info.csv
│   └── CCLE_expression.ipynb

├── GDSC2_drugsens/
│   ├── GDSC2_fitted_dose_response_27Oct23.xlsx
│   ├── GDSC_Fitted_Data_Description.pdf
│   ├── screened_compounds_rel_8.5.csv
│   └── datasets/
│       ├── drug_info.csv           – compounds + SMILES
│       ├── drug_name.txt           – plain-text compound names
│       ├── features/
│       │   └── GeneExp.csv         – 676 cell lines × 735 genes
│       └── sensitivity/
│           ├── DrugSens.csv
│           ├── DrugSens_withnull.csv
│           ├── DrugSens_onlynull.csv
│           ├── DrugSensPivoted.csv
│           ├── pivot/              – final train/val/test splits
│           │   ├── clas/           (4 CSVs)
│           │   └── regr/           (4 CSVs)
│           └── stack/              – long “stacked” variants
│               ├── clas/           (4 CSVs)
│               └── regr/           (4 CSVs)

├── processed_datasets/
│   ├── dataset_summary.txt
│   ├── multimodal_dataset_final.csv   (≈ 1.6 GB)
│   ├── multimodal_dataset_final.pkl   (≈ 6.8 GB)
│   └── multimodal_features_scaled.csv (≈ 4.3 GB) – legacy

├── logs/                  – TensorBoard event files
│   ├── train/
│   └── validation/

├── best_multimodal_model.keras
├── 01_data_prep.ipynb
├── 02_gen_dataset.ipynb
├── CCLE_expression.ipynb
├── Data Integration & Preparation.ipynb
└── Model Development & Training.ipynb

Last updated