3 Replies Latest reply on Oct 7, 2015 1:20 PM by ping z

    How to delete some rows in tableau server extract

    ping z

      Hi,

      I know this question was asked before, and it seems there is no way to do that.

      How do I delete some data from an Extract?

       

      The situation I am facing is that I have Tableau connected to a HUGE database, which is incrementally extracted to Tableau. Problem is sometime (quite often) there is data issue from the source database, which I prefer to purge most recent data (say last 2 days), and then redo incremental refresh, rather than spending very long time to do a full refresh.

       

      If purge partial data is not possible, is there any workaround? I am thinking of backing up the extract everyday, if abnormal data comes in, I can switch to previous day's copy of extract. On tableau server, can I do that? and How? (if anyone knows, please list steps).

       

      Many thanks!

        • 1. Re: How to delete some rows in tableau server extract
          Dan Scott

          Interesting question.

          I believe that your idea of using a previously backed up extract file could be made to work (though this is definitely not some supported scenario).

          This is not something I have ever tried, but...

          You would need to find the GUID associated with the workbook or datasource.  This can be done by querying the database.  The extracts table contains the GUID, and can be joined to either a workbook or a datasource.

          Once you have the GUID, you could simply swap in the old (good) extract for the newer (bad) one within the File Store.  That is, move your old extract back into the GUID folder inside the File Store (and move or delete the newer one from the directory).

          If you are running multiple File Stores, then you would need to make this change on each of them.

          You might run into issues with files being locked (due to being in use).

          Again, not supported, but I think it would work.

          • 2. Re: How to delete some rows in tableau server extract
            Jeff Strauss

            this can be done, but in a round about way and with a bit of effort, and we call it the "backdated incremental" script.

             

            How we do it is that we have a tabcmd script that runs each day and is based on the the blog post:  Disqus - Refreshing Large Extracts Faster

             

            The concept is clunky, but we really only have to deal with it once a quarter.  It would be a lot cleaner if we could pass a parm into tabcmd refreshextract via idea http://community.tableau.com/ideas/4768

             

            1. create an "immutable" version of the extract that has all data up through the end of the previous quarter (e.g. cutoffdate=6/30/15 fullorincremental=F). To do this, we have a filter:

            "iif(([Date] <= date([CutOffDate]) and [FullorIncremental] == "F" ) or ([Date] > date([CutOffDate]) and [FullorIncremental] == "I" ),true,false)"

             

            2. Manually refresh immutable and publish to server once a quarter in the second week of the quater.  Prior to publishing. the fullorincremental parm is changed to "I"

             

            3. create a copy of the published "immutable", but call it "incremental" and point the report at this

             

            4. This is where the "magic" comes.  Each night, prior to running a tabcmd refreshextract --incremental, we do a get of the immutable and we publish it on top of "incremental".  In effect, this allows refreshing the incremental with a start of the beginning of the quarter (7/1)

            • 3. Re: How to delete some rows in tableau server extract
              ping z

              A simple way of rolling back of tableau incremental extract can be done as follows:

              (1) in Tableau server background extract, find the extract_id for an extract

              (2) on Server, go to c:\programdata\Tableau\Tableau Server\Data\tabsvc\dataengine\extract, where the extract .tde file can be found through searching extract_id  (or simply identified by schedule)

              (3) backup an extract which is used for roll back

              (4) stop Tableau, use an older extract to overwrite the latest extract (keep a copy of latest one, in case), then restart Tableau server.

              (5) ***the first incremental extract with the rolled-back extract will fail (0 row inserted, maybe the key value for incremental refresh is stored somewhere), but the second incremental extract will work as expected.

               

              This is based on my experiment. Not sure if it always works. Be sure to keep all copies of extracts before replacing.