4 Replies Latest reply on Oct 15, 2018 6:03 AM by Jonathan Drummey

    Data Source Stability improvement - Help required.

    George Alexander

      Need Help with the ETL Process.


      Context :

      Current we used Tableau for reporting purpose. There are millions of records fed weekly.

      There is no formal ETL tool. The data is cleansed using multiple excel sheets and Power BI. The data is then exported into a Text format using ACL language.

      The feed to the Tableau is this text file (millions of records)


      Issue :

      Tableau desktop is very slow to open twb files

      Data stability and integrity is a risk using excel sheets

      Maintenance effort for Excel sheets is really time consuming

      ACL language is not a widely used language and scripts needs to be maintained

      Using text file impacts load and extract time

      Max 3 years data is loaded currently since its impacting efficiency


      Solution / Process improvement


      Could you please let me know how to improve this whole ETL process. I am envisaging that data should flow to Tableau through a Database (Oracle/SQL)

      How to ensure we can report multiple years without compromising on efficiency (There are millions of records)


      Thanks in advance!

        • 1. Re: Data Source Stability improvement - Help required.
          Rory Wallis

          Hi George,


          Quite a broad question you have here! I myself am trying to improve the efficiency of how I assemble reports so I'm going to take some interest in whatever solution works best for you.


          My ideal process is creating views on my database. Tableau Prep can connect directly to these so you could in theory handle all necessary joins in the database then use a Prep flow to cleanse the data however you wish (potentially saving all those Excel sheets and reducing query time!). Outputting this as a Hyper should then give you a fast dashboard (at least relative to what you have now). The only downside here is that Prep doesn't currently allow for scheduled updates so you would have to refresh the data manually - still better than the current process though!



          • 2. Re: Data Source Stability improvement - Help required.
            John Ajewol


            With Tableau prep 2018.2.2 we can now schedule flows using the command line.

            • 3. Re: Data Source Stability improvement - Help required.
              Rory Wallis

              That's music to my ears John! I'm going to start trying to implement my suggestions now I know that's the case so I'll update if I have much success.


              • 4. Re: Data Source Stability improvement - Help required.
                Jonathan Drummey

                Hi George,

                If you're using CSV files as the source the reason why Tableau Desktop is slow to open files is that as the file is opened Tableau is loading the file in the background into a "shadow extract" that is a .hyper or .tde file (Hyper is default as of v10.5). The reason Tableau does this is so that later interactions with the data during the session as you're dragging & dropping pills will be faster. If you've ever noticed the "For faster start times use an extract" message it's prompting you to explicitly create a Tableau data extract so that way Tableau won't have to build the shadow extract.


                However Tableau data extracts won't solve the underlying issues w/Excel, tho. I'd recommend that you check out one of the data preparation tools that make data preparation faster/easier in the way that Tableau Desktop does. Tableau Prep is a new entrant into this category, there are also tools like Alteryx, Trifecta, etc. In some projects I've worked on having those available has held off the time that we've needed to move to a full database and the extra overhead that adds for maintenance.