data.demo_data

pytrial.data.demo_data.load_synthetic_ehr_sequence(input_dir=None, n_sample=None)[source]

Load synthetic EHR patient sequence data, which was generated by PromptEHR (https://arxiv.org/pdf/2211.01761.pdf).

Parameters
  • input_dir (str) – The folder that stores the demo data. If None, we will download the demo data and save it to ‘./demo_data/synthetic_ehr’. Make sure to remove this folder if it is empty.

  • n_sample (int) – The number of samples we want to load. If None, all data will be loaded.

pytrial.data.demo_data.load_trial_patient_sequence(input_dir=None)[source]

Load synthetic sequential trial patient records.

Parameters

input_dir (str) – The folder that stores the demo data. If None, we will download the demo data and save it to ‘./demo_data/demo_patient_sequence/trial’. Make sure to remove this folder if it is empty.

pytrial.data.demo_data.load_trial_patient_tabular(input_dir=None)[source]

Load synthetic tabular trial patient records.

Parameters

input_dir (str) – The folder that stores the demo data. If None, we will download the demo data and save it to ‘./demo_data/demo_trial_patient_data’. Make sure to remove this folder if it is empty.

pytrial.data.demo_data.load_trial_outcome_data(input_dir=None, phase='I', split='train')[source]

Load trial outcome prediction (TOP) benchmark data.

Parameters
  • input_dir (str) – The folder that stores the demo data. If None, we will download the demo data and save it to ‘./demo_data/demo_trial_data’. Make sure to remove this folder if it is empty.

  • phase ({'I','II','III'}) – The phase of the trial data. Can be ‘I’, ‘II’, ‘III’.

  • split ({'train', 'test', 'valid'}) – The split of the trial data. Can be ‘train’, ‘test’, ‘valid’.

pytrial.data.demo_data.load_trial_document_data(input_dir=None, n_sample=None, source='preprocessed', date='20221001')[source]

Load trial document data obtained from ClinicalTrials.gov.

Parameters
  • input_dir (str) – The folder that stores the demo data. If None, we will download the demo data and save it to ‘’./demo_data/demo_trial_document’. Make sure to remove this folder if it is empty.

  • n_sample (int) – The number of samples we want to load. If None, all data will be loaded.

  • source ({'clinicaltrials.gov', 'preprocessed'}) – The source of the data. If ‘clinicaltrials.gov’, we will download the raw data from that website and process it. If ‘preprocessed’, we will load the preprocessed data.

  • date (str) – The date of the clinicaltrials.gov copy. Only valid when source='clinicaltrials.gov'.

pytrial.data.demo_data.load_mimic_ehr_sequence(input_dir=None, n_sample=None)[source]

Load EHR patient sequence data, which needs to be accessed via https://physionet.org/content/mimiciii/1.4/.

Parameters
  • input_dir (str) – The folder that stores the demo data. If None, we will look for the demo data in ‘./demo_data/demo_patient_sequence/ehr’.

  • n_sample (int) – The number of samples we want to load. If None, all data will be loaded.