Everyday, the App Builder and BillingPoint data is refreshed into warehouse, which Tableau then consumes to refresh the Tableau data source extracts. Here is the high-level overview diagram that details how the data flows into the warehouse. The ETL process is developed using Python scripts that do the following:
- Extract data from MongoDB (Atoms in Mongo) or Postgres (Atoms in Postgres)
- Transforms the data into warehouse as rows and columns
- Loads the data into warehouse
- Recreates the required views
- Grants permissions for the tableau user
- Triggers the Tableau data source refresh for the corporation. Data sources must be owned by domain_publisher and not tagged as do not refresh.
Data Flow:
- For corporations with atoms in MongoDB, the nightly refresh Jenkins job reads the data from the MongoDB read replica host. The nightly refresh job reads the atoms from the atoms table in the app builder schema for corporations with atoms in Postgres.
- The nightly refresh job transforms the data using a Python script and is scheduled on Jenkins.
- Once the data is transformed, the nightly refresh job starts writing data into warehouse tables in batches. If metadata changes, it will drop and recreate the tables to reflect the updated columns.
- At the end of the nightly refresh job, it triggers the Tableau data source refresh so that Tableau can pull the data from the warehouse into Tableau data source extracts.
- The data from the Tableau data source extracts is then consumed by the workbooks in Tableau.