Everyday, the data from App Builder and BillingPoint is refreshed into Warehouse which is then consumed by Tableau to refresh the Tableau data source extracts. Here is the high level overview diagram that provides the details about how the data flows into warehouse. The ETL process is developed using Python scripts that does the following:
- Extract data from MongoDB (Atoms in Mongo) or Postgres (Atoms in postgres)
- Transforms the data into warehouse as rows and columns
- Loads the data into warehouse
- Recreates the required views
- Grants permissions for the tableau user
- Triggers the Tableau data source refresh for the corporation. Data sources must be owned by domain_publisher and not tagged as do not refresh.
Data Flow:
- For corporations that have atoms in MongoDB, the nightly refresh Jenkins job reads the data from MongoDB read replica host. For corporations that have atoms in postgres, the nightly refresh job reads the atoms from atoms table in appbuilder schema.
- The nightly refresh job transforms the data using python script. These nightly refresh jobs are scheduled on Jenkins.
- Once the data is transformed, the nightly refresh job starts writing data into warehouse tables in batches. If any metadata changes, then it will drop and recreate the tables to reflect the updated columns.
- At the end of the nightly refresh job, it triggers the Tableau data source refresh so that Tableau can pull the data from the warehouse into Tableau data source extracts.
- The data from the Tableau data source extracts is then consumed by the workbooks in Tableau.