Prepare Oncology Trial Patient Data

We provide an example of how to transform the raw patient-level records to sequence of patient data.

The jupyter notebook example is available at process_NCT00174655.ipynb.

The raw input data comes from Project Data Sphere (PDS) and is available at https://data.projectdatasphere.org/projectdatasphere/html/access. You need to create an account (it’s free) and download the data from trial NCT00174655, put the raw data under the folder ./breast cancer/NCT00174655/.

Note that the raw data are in SAS form .sas7bdat. We need to install pip install sas7bdat package to read the data.