Folder Structure
GDSC2_CCLE-GeneExp/
├── CCLE-DepMap22Q2_geneexp/
│ ├── CCLE_expression.csv
│ ├── CCLE_expression_cleaned.csv
│ ├── CancerGeneCensus_GRCh38_COSMIC_v101.csv
│ ├── Cell_lines_annotations_20181226.txt
│ ├── sample_info.csv
│ └── CCLE_expression.ipynb
│
├── GDSC2_drugsens/
│ ├── GDSC2_fitted_dose_response_27Oct23.xlsx
│ ├── GDSC_Fitted_Data_Description.pdf
│ ├── screened_compounds_rel_8.5.csv
│ └── datasets/
│ ├── drug_info.csv – compounds + SMILES
│ ├── drug_name.txt – plain-text compound names
│ ├── features/
│ │ └── GeneExp.csv – 676 cell lines × 735 genes
│ └── sensitivity/
│ ├── DrugSens.csv
│ ├── DrugSens_withnull.csv
│ ├── DrugSens_onlynull.csv
│ ├── DrugSensPivoted.csv
│ ├── pivot/ – final train/val/test splits
│ │ ├── clas/ (4 CSVs)
│ │ └── regr/ (4 CSVs)
│ └── stack/ – long “stacked” variants
│ ├── clas/ (4 CSVs)
│ └── regr/ (4 CSVs)
│
├── processed_datasets/
│ ├── dataset_summary.txt
│ ├── multimodal_dataset_final.csv (≈ 1.6 GB)
│ ├── multimodal_dataset_final.pkl (≈ 6.8 GB)
│ └── multimodal_features_scaled.csv (≈ 4.3 GB) – legacy
│
├── logs/ – TensorBoard event files
│ ├── train/
│ └── validation/
│
├── best_multimodal_model.keras
├── 01_data_prep.ipynb
├── 02_gen_dataset.ipynb
├── CCLE_expression.ipynb
├── Data Integration & Preparation.ipynb
└── Model Development & Training.ipynb
Last updated