8 Replies Latest reply on Aug 14, 2012 6:34 AM by Dex Bindra

    Method for Creating Multipass Aggregations Using Tableau Server

    Ty Alevizos

      The word doc in the attached zip file describes statistical processing methods using Tableau Desktop, Tableau Server and Tableau Server “tabcmd”. The focus is on the process of creating statistical results as opposed to a detailed standalone discussion about those same various statistical methods.

       

      Starting in version 7, Tableau Server views can now be output as ".csv" - this simple fact forms the basis of the rest of the entire process outlined in the zip file.

       

      I wanted to provide this information as-is and open it up to discussion. Feel free to pick it apart!

       

      Best Regards...

       


       

      From the word document:

       

      At a high level, this paper describes a multi-step workflow process:


      1. Connect to originating data source
      2. Perform first-pass statistical results using Tableau “Table Calculations” or similar
      3. Create and publish a text view of the results to Tableau Server
      4. Using tabcmd, connect to Tableau Server and generate the text view from step 3 in CSV output format
      5. Connect to resulting CSV file in Tableau Desktop
      6. Perform additional statistics and calculations
      7. Optional: repeat steps 3, 4 and 5 - N number of times
      8. Analyze final results

       

       

      Purpose and Scope

       

      The main thesis of this document and its process begs a simple question: why would we want to create an unduly complex situation involving Tableau Server, calculations, CSV file outputs, scripts and automation of that CSV file output, and so forth and so on. It turns out that there are two specific scenarios where this process can prove useful:

       

      1. Data Security and Data obfuscation

       

      In the example included with this document, we analyze approximately 3 million rows of originating FAA airline data. This happens to be public data in the public domain. But what if we wanted to perform advanced analysis on massive data sets but prohibit the wider analysis “audience” from seeing the originating raw data? We would be unable to accurately use Tableau’s built-in statistics on the original data and would need to pre-aggregate. During this pre-aggregation process, we might have to ask the database to perform these calculations in advance. This process might involve IT teams or database administrators and take an undue amount of time.

       

      2. Performance when using Tableau Table Calculations on Massive Data Sets

       

      In the example included with this document, the performance of the text view is not optimal. This is because we are calculating several Tableau “Table Calculations” in order to build a final set of analytics. Using Tableau Server and “tabcmd”, we can farm this operation out to the machines, sparing human cycles. We can automate the creation of the aggregation files and can perform the heavy lifting in the middle of the night or via any periodicity required for accurate analysis. The net result is that Tableau Server processes can run against large data sets, complete the first-pass aggregations, and then make these results available to analysts for further operations. More advanced analytics can involve 2 pass or N pass operations. In general, this type of pre-process is similar to operations performed in a variety of advanced statistics toolsets, such as “R”, SPSS, SAS, or similar.