8 Replies Latest reply on Apr 7, 2016 11:38 AM by Stephen Hicks

    Offline TDE + Server Refresh Management - Possible?

    Michael Gillespie

      We have a client with a somewhat complex use case for Tableau.  They need to manage the daily incremental refresh of a large data set and provide offline access to multiple subsets of the data for analysts to use in a local workbook.  I can’t find a straightforward way to address this issue, and I’m looking for some suggestions.

       

      More detail:

      The client's customers generate daily data that is stored on our client’s SQL Server infrastructure; analysts are assigned 5-10 customers and work only with their assigned customers’ data. Analysts are not physically located on the same network as the SQL Server database servers/Tableau servers and only connect to that network via VPN.  Data volumes are large enough that live connections to Tableau Server are not feasible over a VPN connection.  The client cannot put the Tableau server on the public internet connection for a multitude of reasons.

       

      Current practice is for a DBA/Tableau Admin to refresh the data on a monthly basis, create one workbook for each analyst that contains only their customers’ data, and package it up into a .twbx for manual download from a secure file share.  This has many downsides: timeliness of the data, manual workload for the DBA, and some performance issues given the large data sets.

       

      Desired future state is to provide one TDE per customer to the analysts (i.e., a TDE that contains one customer’s data), and that is incrementally refreshed daily by Tableau Server.  Those TDE’s MUST be available as standalone files to enable the analysts to work on the data without a persistent connection to Tableau Server.

       

      That last requirement is the problem.  I can’t figure out a combination of technologies to get that standalone TDE without hacking into the server filesystem, which I don’t want to do.  I think I can get the TDE itself using the Extract API, but then I lose the management of the refreshes from Server.

       

      In shorter form, I need:

      1. Server-managed incremental data refresh AND
      2. Standalone TDE files

       

      Is there any way to accomplish this without adding something like Alteryx to the process?

        • 1. Re: Offline TDE + Server Refresh Management - Possible?
          Matt Coles

          Yes, you can do this fairly easily. Publish a Data Source or Workbook backed by an incrementally refreshed extract (for each customer, in this case). Schedule it on Tableau Server. Then, you can script the pull of this data out to the fileshare in a number of ways:

           

          1. tabcmd get

          2. URL (e.g. https://mytableauserver/t/mysite/datasources/mydatasource.tdsx - or .twbx)

          3. REST API

           

          That script would need to map each bit of content to the appropriate file share folder (I would assume they are all locked down individually due to security requirements?). But if you know where to put each, it's a simple "copy" command in batch or Powershell to do it. All you'd need to do is automate with a Windows Scheduled Task. I always recommend trapping errors and shooting an email to the admin if it fails, too. But this should get you off the ground at least.

           

          Edit: I forgot, you said you wanted the .tde file, not a .twbx or .tdsx. In that case, you need to add a few lines to your script that uses an unzipping program to unzip and extract the .tde content for each file you get from Tableau Server.

           

          I should think a robust and resilient version of this script, plus testing and deployment would require a week's worth of time. Probably a day for a proof of concept.

          1 of 1 people found this helpful
          • 2. Re: Offline TDE + Server Refresh Management - Possible?
            Michael Gillespie

            Matt, this is extremely helpful.  I realized after I posted this that .tdsx was probably the better way to go.  Thanks for confirming that.

             

            We don't do a huge amount in our practice with moving stuff off & onto Server for disconnected editing, so I'm clearly not as familiar with this process as I need to be.  This will help TONS in getting it working right.

             

            Again, many thanks!

            • 4. Re: Offline TDE + Server Refresh Management - Possible?
              Stephen Hicks

              Matt,

              This is super helpful.  Would love to get your tips on error handling.

               

              1) I've got a big tde on tableau online (so big I must accumulate it gradually since a single publication exceeds the tableau online 2hr upload limitation)

              2) I can manually create new tde's on my c: drive from my tds (connected to my sqlsvr view)

              3) Then I can easily use tabcmd to append them to my big tableau onlne extract.

              However sometimes there are errors, so I'm looking for other ways of automating step 2.

               

              I think I hear you saying: let a TDSX refresh on tableau online, then download it, then tabcmd append it to my big tde.

               

              I've been using SSIS to ALTER VIEW in sql which lets me "batch" updates to records.  Then I call tabonlinsyncclient.exe to incrementally refresh and append recent data to tableau online.

              Using this method, I've been able to append hundreds of millions of rows so far.

              This approach allows me to workaround the lack of an UPSERT capability for TDE files. (ask me for details on that)

               

              (Note: my source data is sqlsvr, not a csv, and my tds contains lots of curation, so my tde updates must come from my tds xml structure.)

               

              I often get network or "forcibly rejected" errors.

              What types of publish/append errors have you been able to capture and handle ?

              Also - have you ever noticed that once a command line refresh is complete, it still takes a while for the data to appear on the server?  (I have to keep checking till it's done before attempting to send more batch data)

               

              - Steve

              • 5. Re: Offline TDE + Server Refresh Management - Possible?
                Eric Axelrod

                Steve,

                 

                It sounds like you have an ingenious method for batching data pushes to Tableau Online.

                 

                I have to ask about this:

                     This approach allows me to workaround the lack of an UPSERT capability for TDE files. (ask me for details on that)

                 

                So, how are you handling updates to rows which already exist in the tde?

                 

                Eric Axelrod

                President & Chief Architect

                www.digr.io

                • 6. Re: Offline TDE + Server Refresh Management - Possible?
                  Matt Coles

                  With regards to error handling, I don't think I've ever been able to comprehensively trap all the errors I've wanted to when calling tabcmd. It's tough. Here are a couple of posts I found that might be helpful:

                   

                  tabcmd refreshextracts error handling

                  Re: Issues Automating Tableau Datasource Refreshes with TABCMD 8.0

                   

                  I haven't noticed a delay in getting new data from a recently refreshed extract--but I run Tableau Server, so things work a bit differently in that I'm using Server itself to perform the refresh operations, not doing them locally and pushing the results up.

                  • 7. Re: Offline TDE + Server Refresh Management - Possible?
                    Stephen Hicks

                    I'm glad you like it.

                    I'm not updating anything.. rather, I hide old records.

                    I use a timestamp per batch for my incremental key.

                    Then I query for the most recent timestamps and export to a small csv list.

                    The csv is refreshed in a tableau online extract.

                    Finally - blend it with the master data and filter off nulls

                     

                    Any ideas how to refresh a local TDE without having to use the server,

                    pull down, then push/append back up ?

                     

                     

                    On Thu, Apr 7, 2016 at 7:12 AM, Eric Axelrod <tableaucommunity@tableau.com>

                    • 8. Re: Offline TDE + Server Refresh Management - Possible?
                      Stephen Hicks

                      Thanks Matt   I'll check out the error handling links.

                      One thing seems sure to me to increase tableau online errors... if I don't

                      wait and confirm that the previous batch is fully appended.  Sometimes it

                      can take 30mins or more before the data suddenly appears.

                      Meanwhile the CMD window has already completed and closed, but tableau

                      online is still working in the background (even though this does not appear

                      on the background tasks status).

                      If I send a new file too soon, I think it gets confused and is sure to

                      error out.

                       

                      On Thu, Apr 7, 2016 at 8:38 AM, Matt Coles <tableaucommunity@tableau.com>