Incremental Extracts in Tableau

Version 4

    Purpose:

    This document contains scenarios to process incremental data loads in tableau. By default, tableau allows to load the full data set every time the report refreshes. This may at times be cumbersome and degrade the performance while processing reports. The document will capture 4 approaches depending on the structure and processing technique of the source data.

     

    Options for programming incremental data processing in Tableau

     

    The tableau reports are refreshed after the scheduled data extraction process for a report is completed. The data extraction process currently imports entire data set and replaces the previously extracted data set available on the server. Processing entire data extract in tableau can be an overhead in two scenarios:

    1. Large and continually increasing data sets
    2. Data sets are modified daily/hourly/or every minute.

     

    Incremental load ideally

    1. Captures changes to the existing data
    2. Adds new records to the existing data
    3. Updates the records for any modifications to the existing

     

    Note: Below approaches are highly dependent on the technique of data loading and processing Below are the approaches that can be programmed within tableau to perform Incremental refreshes.

    1. Full load with incremental refresh
    2. Incremental load for SCD(Slowly changing dimensions)
    3. Incremental load for UPSERT (Update else insert) operations to underlying data model
    4. Append data from file

     

    1. Full load with incremental refresh

     

    By default, every time a Tableau refresh executes, all records in already extracted data set on tableau server are replaced with new data. An alternative to replacing the data extract is to add new content to the existing extract on the server. The extract can be purposed to add new records. Below are the steps to set incremental refresh http://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#extracting_refresh.html

    1. Select a data source on the Data menu and then select Extract Data.
    2. In the Extract Data dialog box, select All rows as the number of Rows to extract. Incremental refresh can only be defined when you are extracting all rows in the database.
    3. Select Incremental refresh and then specify a column in the database that will be used to identify new rows. For example, if you select a Date field, refreshing will add all rows whose date is after that last time you refreshed.
    4. Alternatively, an ID column can be used, that increments, as rows are added to the database.
    5. When finished, click Extract.

     

     

    2. Incremental data load for Slowly changing dimensions

     

    If the tableau data source is refreshed daily for new data, tableau rebuild the entire data set. The approach being discussed below, works seamlessly with the new data that is being added to the tableau extract particularly with Slowly changing dimensions.Below is a scenario for incremental loads for SCD type 2 with an example of Orders data set.

    • Below is test data for “ORDERS” table consisting orders data. OrderID and MODIFICATION_WID is a unique identifier and FROM_DATE determines the active date for an order.

     

    ORDERIDMODIFICATION_WIDFROM_DATEPAYMENT
    10330035/5/2017 12:00:00 AM2000
    10330045/7/2017 12:00:00 AM20001
    10440015/8/2017 12:00:00 AM3000
    10440025/9/2017 12:00:00 AM3000
    10010015/1/2017 12:00:00 AM1000
    10010025/2/2017 12:00:00 AM1020
    10120015/1/2017 12:00:00 AM2000
    10120025/3/2017 12:00:00 AM2000
    10120035/4/2017 12:00:00 AM2000
    • As per the change data control management process, incremental load in tableau should be able to capture the latest updated order information, if there are any modifications to the order data. This cannot be directly accomplished in tableau via incremental extracts. The following modification to the tableau workbooks can help achieve the functionality.
    • It can be achieved by calculating the the end date of the previous record for an order. the calculation is performed in tableau to determine the existing and new records.
    • Create calculated field “RecentOrderID” and add the below tableau code to the calculation.

              { FIXED [Orderid]:MAX([Modification Wid])}

    • Calculate the end date for an existing record if the record already exist in tableau extract.

     

    • Calculate the end date for an existing record if the record already exist in tableau extract. Create calculated field End_date and add the below code to the calculation

         IF ATTR([Order Wid]) = ATTR([RecentOrderID])

         then     NULL    

         ELSE     LOOKUP(ATTR([From Date]),1)    

         END

     

    • Drag all the column that need to be displayed in tableau report.
    • Filter on the records where End_date of the order = null.
    • The tableau report will now have all valid / new and updated data.

     

    3. Incremental load for UPSERT operations to underlying data model

     

    The below example describes data model where dimensional data is updated frequently without CDC (change data capture) in data warehouse. Therefore, the order status (in D_ORDERS) is modified without capturing the time when the modification was made.

    The example considered below consists of a star schema for orders data model. With Fact_orders as fact table and d_orders as dimension table.

     

    ORDERWIDPAYMENTLAST_UPDATE_DATE
    1005005/1/2017 12:00:00 AM
    10215005/2/2017 12:00:00 AM
    10325005/7/2017 12:00:00 AM
    10425005/6/2017 12:00:00 AM
    10520005/9/2017 12:00:00 AM
    1065005/10/2017 12:00:00 AM
    10715005/12/2017 12:00:00 AM

    Fact_orders fact table

    ORDERWIDORDERIDORDERSTATUS
    1001001NEW
    1021002NEW
    1031003SHIPPED
    1041004SHIPPED
    1051005SHIPPED
    1061006NEW
    1071007SHIPPED

    D_Orders dimension table

     

    The following approach will display recent data sets for the incremental loads, ignoring the old records.

    • Extract the data and set the extract to incrementally process on the last_update_date (date/id) field from fact table.
    • Generate a “calculated date” calculated field in Tableau

              { FIXED [Orderwid]: MAX([Last Update Date])}

    • Generate a calculated field “DataValidationFlag”

              ([Last Update Date] >= [calculated date])

         

    • Do one of the following
      • Filter the data per view
        • Generate a report with required fields on the reporting pane and drag the two-new field “calculated date” and “DataValidationFlag” in report.
        • Filter on the “DataValidationFlag” to display records with filter condition “TRUE.”
      • Filter for all views
        • Add a data source filter for "DataValidationFlag" and check "TRUE."
    • This solution will filter all the old records available in existing tableau extract.

     

     

    4. Append data from file

    To add content to the existing extract a manual option has been made available by tableau. This approach requires updates access to tableau desktop.Below are the steps to append data to the existing extracts

    1. Right click datasource --> extract --> Appen Data from file
    2. Select the file to append the data from the extract. The file could be csv, excel, twb or tde extension. Therefore, it is possible to append an extract to the existing extract.

    Cons:

    1. Appending data to an extract is a manual effort and needs to be accomplished via Tableau desktop.
    2. The merged data source need to be published to the server every time there is changes to the new data source.
    3. Refreshing the existing data source will replace the content of the merged data.