6 Replies Latest reply on Oct 24, 2018 12:33 AM by Rory Wallis

    Data Source Stability improvement - Help required.

    George Alexander

      Need Help with the ETL Process.


      Context :

      Current we used Tableau for reporting purpose. There are millions of records fed weekly.

      There is no formal ETL tool. The data is cleansed using multiple excel sheets and Power BI. The data is then exported into a Text format using ACL language.

      The feed to the Tableau is this text file (millions of records)


      Issue :

      Tableau desktop is very slow to open twb files

      Data stability and integrity is a risk using excel sheets

      Maintenance effort for Excel sheets is really time consuming

      ACL language is not a widely used language and scripts needs to be maintained

      Using text file impacts load and extract time

      Max 3 years data is loaded currently since its impacting efficiency


      Solution / Process improvement


      Could you please let me know how to improve this whole ETL process. I am envisaging that data should flow to Tableau through a Database (Oracle/SQL)

      How to ensure we can report multiple years without compromising on efficiency (There are millions of records)


      Thanks in advance!

        • 1. Re: Data Source Stability improvement - Help required.
          Rory Wallis

          Hi George,


          Quite a broad question you have here! I myself am trying to improve the efficiency of how I assemble reports so I'm going to take some interest in whatever solution works best for you.


          My ideal process is creating views on my database. Tableau Prep can connect directly to these so you could in theory handle all necessary joins in the database then use a Prep flow to cleanse the data however you wish (potentially saving all those Excel sheets and reducing query time!). Outputting this as a Hyper should then give you a fast dashboard (at least relative to what you have now). The only downside here is that Prep doesn't currently allow for scheduled updates so you would have to refresh the data manually - still better than the current process though!



          • 2. Re: Data Source Stability improvement - Help required.
            John Ajewol


            With Tableau prep 2018.2.2 we can now schedule flows using the command line.

            • 3. Re: Data Source Stability improvement - Help required.
              Rory Wallis

              That's music to my ears John! I'm going to start trying to implement my suggestions now I know that's the case so I'll update if I have much success.


              • 4. Re: Data Source Stability improvement - Help required.
                Jonathan Drummey

                Hi George,

                If you're using CSV files as the source the reason why Tableau Desktop is slow to open files is that as the file is opened Tableau is loading the file in the background into a "shadow extract" that is a .hyper or .tde file (Hyper is default as of v10.5). The reason Tableau does this is so that later interactions with the data during the session as you're dragging & dropping pills will be faster. If you've ever noticed the "For faster start times use an extract" message it's prompting you to explicitly create a Tableau data extract so that way Tableau won't have to build the shadow extract.


                However Tableau data extracts won't solve the underlying issues w/Excel, tho. I'd recommend that you check out one of the data preparation tools that make data preparation faster/easier in the way that Tableau Desktop does. Tableau Prep is a new entrant into this category, there are also tools like Alteryx, Trifecta, etc. In some projects I've worked on having those available has held off the time that we've needed to move to a full database and the extra overhead that adds for maintenance.



                • 5. Re: Data Source Stability improvement - Help required.
                  George Alexander

                  Thank you everyone. I am trying to have IT create table views to improve the process.We do not have any ETL tools available to us now.

                  My next challenge is the following. I am working in the logistics department. The source data has different mutations of the carrier names and we currently normalize it manually before feeding to dashboard. Could you  please advice on how this can be automated after we get it from the source. Thanks in advance.

                  • 6. Re: Data Source Stability improvement - Help required.
                    Rory Wallis

                    You could use the "Group and Replace" functions in Prep. You'd have to manually update the carrier names the first time and then any changes you make will be stored when you refresh the data.


                    The only caveat with this is that every time there is a new carrier name that you haven't cleaned already you'll have to amend it yourself. I have a similar issue with supplier names in the data I use. In this case I set up a separate branch with a join between my two data sets and exclude any matches from one of the two files. This gives me a list of supplier names that are unique to one of the sheets and helps prompt me what needs changing each week.


                    Hope this helps!