Data partners
Thank you for being a PHOENIX Data Partner!
Use this page to familiarize yourself with the requirements and file naming used in the PHOENIX Project before uploading your data to your organization's data bucket.
Standard Operating Procedures
Process Steps
1. When you click on the 'UPLOAD" button above, you will then be directed to login using your Google account.
2. Once you are logged in, click on the "CURRENT" folder.
3. Then use the "upload file" link to upload your file to your specific bucket.
File Naming
The file name should identify the source, the content and the period during which the data was collected. For continuously streaming data, the snapshot date will be used for the period.
Example 1: Patient Education Genius (PEG) Survey
- PEG data that resides in an active google sheet that is continuously loaded.
PEG_CV19CommunitySurvey_20200904.csv
- PEG data that has been transferred to a static unlinked Google Sheet.
PEG_CV19DTE_202005.csv
- PEG data that has been extracted from a closed google sheet.
PEG_CV19DTE_Complete.csv
Example 2: Wayne Health COVID-19 Testing Data
- A cumulative database is pushed to an ftp server every two days.
WSUPG_CV19Testing_20200803.csv
Methodology
Each Data Partner will have semi-private bucket to which they can directly upload their files. Each project has a separate folder located in the PHOENIX data-lake bucket.
Within the project folder, a subfolder will be established for each dataset and a "gate" folder will be created to store files that are ready to be loaded to the PHOENIX data repository. The current files awaiting processing will be placed in the top level of the gate folder. They will remain in that folder until any pre-load process has been performed. When processing is completed raw files are moved to an archive folder and the raw data ingested into BigQuery. The raw source files are stored in a read-only encrypted format. Files should not be modified or deleted and to assure that no accidental changes are made to the file a 3-month hold on deletion and modification is placed on every file. Except for the archive folders, there will be at most one file sitting in any data-lake folder.
Data files may be cleaned and standardized in preparation for ingestion into the PHOENIX Data Repository. Files destined to the OMOP repository will be coded using specified vocabularies and domains.