3 Replies Latest reply on Apr 8, 2016 4:16 PM by Stephen Hicks

    Batch loading huge data source on Tableau Online

    Stephen Hicks

      PROBLEM:

      My target data source on tableau online will be hundreds of millions of rows.

      My sqlsvr source data will not be on AWS or other cloud server, so I must use tableau online extracts (not live) connections.

      Tableau online has a 2hour upload timeout.

      Therefore, I need to batch upload my master TDE gradually, and I want to do so automatically.

      In other words, my TDS will continue to undergo curation modifications for some time, and until the design settles into place,

      each night I want to batch-reload tableau online so that by morning, I can study the last several years of detail data with my new TDS design.

       

      SOLUTION:

      So far, I've been successfully using SSIS to ALTER VIEW in sql which lets me "batch" updates to for particular date ranges.

      Then I call tabonlinsyncclient.exe to incrementally append recent data to tableau online, then move onto the next batch.

      Using this method, I've been able to append hundreds of millions of rows so far.

      However, there are three main problems:

      1) frequent network or "forcibly rejected" errors appear (which are often red hearings since they append usually works despite the error)

      2) tableau online continues to work after the upload completes, and the only way to tell when it's really "done" is to refresh a checksum twb and download the results till they change.

      3) after many hours and dozens of successful uploads, errors become more frequent and sometimes grind new uploads to a halt.  (manual uploads work for many more records (over 400,000,000) than automated uploads have been able to achieve so far)

       

      a) What other methods have people used to gradually build a very large extract?

      b) What types of errors and error handling have proven useful?

      c) Can the SDK be made to simulate a desktop UI action of: "refresh extract + add to saved data sources" ?  (so-as to get a local tde refreshed)

      d) Should I be full-refreshing a batch tdsx on tableau online, then downloading it, then upload/appending it to my master extract online?

       

      Cheers

      Steve

        • 1. Re: Batch loading huge data source on Tableau Online
          Dan Cory

          Steve -

          Thanks for listing all your issues! It really helps.

           

          Looking at over 400M records in an extract is pretty large. I hope we can make it work.

           

          I really would try using a incrementally updated extract on a schedule. Tableau will call your view with a filter for the last ID it found. You can adjust your view to allow more records through whenever you want. The next time Tableau updates (or you force it to update with tabonlinesyncclient), it will pick up the additional records. I think that will be the most straightforward way.

           

          As for your three main problems:

          1. I'd love to see more details on this. You may need to file a support case so you can share detailed error messages and we can get a network trace to review.

          2. Tableau is merging the additional extract onto the old extract and that takes time. You should be able to check the last refresh date through the REST API, though. http://onlinehelp.tableau.com/current/api/rest_api/en-us/help.htm#REST/rest_api_ref.htm#Query_Datasource%3FTocPath%3DAPI…

          3. As the extract gets larger, the merges take more time, so you are probably timing out more frequently. See 1 above.

           

          And your other questions:

          a. The main other route is using the extract API. You control everything about creating it.

          b. I don't know what to suggest here, hopefully the community can help.

          c. If you use the extract API, then you just write records and are responsible for getting them from SQL Server yourself.

               You are already using tabonlinesyncclient which is the part that knows about connections to SQL Server. I think you just need to be using a incremental extract.

          d. I wouldn't recommend this.

           

          I hope some others can jump in to help too.

           

          Dan

          • 2. Re: Batch loading huge data source on Tableau Online
            Jennifer Lynch

            Hi Stephen,

             

            Did Dan's reply answer your question? If not, his suggestion to open a support case is a great one. Please let me know if you have any further questions.

             

            Jenny Lynch

            • 3. Re: Batch loading huge data source on Tableau Online
              Stephen Hicks

              Thanks guys!

              Yes, I do use tabonlinesyncclient extensively.  In fact it's the only way I

              know how to fully automate a TDS->TDE->Append process.

              I've I alter my view, then launch tabonlinesyncclientcmd.exe from the cmd

              prompt, and this works great for hundreds of millions of records, but then

              it starts to falter inexplicably.

              I've also been working with Tableau Support with several tickets for

              several weeks.

              We've made incremental improvements, but not perfect results yet.

              I'm also preparing to try the Tableau SDK, however, I need someone to show

              me how to start with a TDS and produce a refreshed TDE (just like you do in

              the user-interface of tableau)

              The TDS metadata has lots of renames, calculated fields, etc which I want

              my TDE to know about.

               

              I have not tried two of your main points yet:

               

              2. Tableau is merging the additional extract onto the old extract and that

              takes time. You should be able to check the last refresh date through the

              REST API, though.

              http://onlinehelp.tableau.com/current/api/rest_api/en-us/help.htm#REST/rest_api_ref.htm#Query_Datasource%3FTocPath%3DAPI…

              <http://onlinehelp.tableau.com/current/api/rest_api/en-us/help.htm#REST/rest_api_ref.htm%23Query_Datasource%3FTocPath%3DAPI%2520Reference%7C