4 Replies Latest reply on Jan 31, 2017 4:01 AM by Dovi Lilling

    Data Extract - Union or Overlapping Refresh

      I have an Oracle table that contains 10 million records (~8GB in size) of Emergency Department data. This data is updated daily (inserts ~2500 new records and updates ~1000 existing records). How can I efficiently publish and keep updated this as a data source on our Tableau server?

       

      Option 1 - v7 Incremental refresh. Fails. It only updates the inserted records, but not the recent refreshes (no ability to set a refresh window ... say last 7 days).

       

      Option 2 - Create two tables. One that contains records that are closed/completed (table A) and another that contains "recent" records that may still be changed (table B). Table A (millions of records) would be set for an incremental refresh, while Table B (thousands of records) is set for a full refresh every day. I would then "somehow" union the two data extracts in Tableau to create a third and final data extract. ????

       

      Option 3 - Run a massive full refresh daily.

       

      Any ideas on if Option 1 is in the works, or if Option 2 is possible now.

       

      The end goal is to be able to publish a single data source that can be used across the organization that is updated daily.

       

      Thanks

      Jason

        • 1. Re: Data Extract - Union or Overlapping Refresh
          Russell Christopher

          Unfortunately, I don't think option two will work for you, either. If a record lands in your extract and then is updated later, you might use some Kung Fu outside of Tableau to identify it. However if you insert that newly changed row, you're not replacing the original, un-updated row....you now have two versions of that sucker in your final extract

           

          Unless you do a full refresh, you're always going to be running into the "multiple copies of an updated row" issue...

          • 2. Re: Data Extract - Union or Overlapping Refresh
            Richard Leeke

            I think it might be possible to juggle things so that your option 2 could work - though this may need some under-the-covers trickery with database views so that Tableau doesn't realise what you're up to.  I haven't tried what I'm about to suggest, so apologies if you find there's a fatal flaw in this scheme.

             

            Firstly, let me clarify my understanding of your data life cycle.  As I understand it, records have essentially two states (let's call them "Open" and "Closed").  Records are created in the Open state.  Open records may be updated multiple times, until they are set to Closed.  After that, they are never again updated and are never deleted.  If that is correct, this approach might work.

             

            1) Create an initial full extract of all Closed records.

            2) Save a copy of the "All Closed" extract.

            3) Incrementally refresh the extract with the Open records (see below for what you might need to do for each of the incremental update steps).

            4) Next time you want to update the extract, replace the extract with the copy you saved in step 2.

            5) Incrementally update with all records which have been closed since the original extract.

            6) Replace the "All Closed" extract with a copy of the updated one.

            7) Incrementally update the extract again with all Open records.

            8) Repeat steps 4) to 7) each time you want to refresh the extract.

             

            I haven't done much with incremental extracts from database data sources (as opposed to from files), but I'm fairly sure that you will need to define your data source in terms of a database view so that Tableau believes you are always using the same query definition. Unbeknownst to Tableau you can adjust the definition of the view to pick up appropriate records (switch between Open and Closed and perhaps do something tricky with a sequence number or date field). Maybe if you have a [Status] field and a [Closed Date] (which is set to some date in the far future for Open records) or some such trickery.

             

            Let us know if you manage to get something working, I imagine this may be a fairly common scenario - so if there is a way to make it happen it could save lots of time for lots of people.  I can see that the time saving could be well worth a bit of skull-duggery.

            • 3. Re: Data Extract - Union or Overlapping Refresh
              amaryllis floweret

              HI Richard Leek,

              I am sorry, i cant get your steps to get this working ....I am in the exact same situation :

              -I have my immutable data incrementally refreshed ( With only immutable rows )

              -I have my mutable data fully refreshed daily ( with all mutable rows, what you called here "open" rows)

               

              Both data sources are published to tableau server and working perfectly ! My concern here : How to combine them into only one data source automatically refreshed ?

               

              Your help will be very much appreciated !!!!

              • 4. Re: Data Extract - Union or Overlapping Refresh
                Dovi Lilling

                You may also want to upvote my idea regarding this: https://community.tableau.com/ideas/7076