2 Replies Latest reply on Jul 10, 2017 11:10 AM by Russell Christopher

    Best way to handle daily huge volume of data? Say, Daily 1M rows

    Bodhisattva Dasgupta

      Hi,

       

      What is the best way to handle a daily flow of huge volume of data into Tableau?

       

      Say, for instance i have client account data in a Table, which is going to be about 500K-1M rows per day.

       

      In Tableau, I will have 3-4 dashboards, couple of them at a summarized level, and couple at detail level with filters available for filtering.

       

      My questions are...

      - what would be the best way to manage this data? The database would be Hadoop, with an Impala connector.

      - Is it okay to just do a daily incremental refresh into Tableau? If so, how to handle cases where historical data gets updated? (might happen a handful of times in the year)

      - would using another tool in front of Tableau, like Alteryx, help?

      - is there any way in which we can segregate out a data source into a moving 1 month data that always FULL refreshes + another that INCREMENTALLY refreshes daily? And UNION them?

       

      Any help/thoughts in this regard will be appreciated.

        • 1. Re: Best way to handle daily huge volume of data? Say, Daily 1M rows
          Patrick Van Der Hyde

          Hello Bodhisattva,

           

          There are several documents and trainings available related to Tableau Server and scalability.

           

          Tableau Server Scalability - A Technical Deployment Guide for Server Administrators | Tableau Software

          8 tips for deploying Tableau at scale | Tableau Software

           

          and this Interworks series - 8 tips for deploying Tableau at scale | Tableau Software  is helpful as well. 

           

          I will move this thread to our Server Administration area where other Tableau Server Administrators often reply to threads of this sort. 

           

          I hope this helps.

           

          Patrick 

          • 2. Re: Best way to handle daily huge volume of data? Say, Daily 1M rows
            Russell Christopher

            Hey There -

             

            - what would be the best way to manage this data? The database would be Hadoop, with an Impala connector.

             

            In a perfect world, you will partition your Impala tables and do all the "Performance Tuning" work necessary to allow Impala to answer questions quickly. In other words, Tableau data extracts aren't a replacement for solid administration on the Hadoop side of the equation

             

            - Is it okay to just do a daily incremental refresh into Tableau? If so, how to handle cases where historical data gets updated? (might happen a handful of times in the year)

             

            You probably don't want to simply copy "all rows" from Impala into a TDE.  You want to focus on making Impala generally performant and then then use TDEs as "spot solutions" to do things like aggregate lots of leaf-level data into a relatively small resultsets that can be leveraged by vizzes which show trends and highly summarized answers. Impala is your acceleration layer on Hadoop, not Tableau If you go this route, you won't be collecting lots of data in your TDE's anyway, so incremental vs. full-refresh becomes a non-issue.

             

            Incremental extracts will not pick up changes to rows which have already been ingested by it.

             

             

            - would using another tool in front of Tableau, like Alteryx, help?

             

            No. Do your work in the data system, make sure the data system (Impala) is fast, and then all is well.

             

            - is there any way in which we can segregate out a data source into a moving 1 month data that always FULL refreshes + another that INCREMENTALLY refreshes daily? And UNION them?

             

            Still sounds like you're thinking about taking an "extract everything" approach, which really isn't appropriate here

            1 of 1 people found this helpful