1 2 3 Previous Next 33 Replies Latest reply on Mar 15, 2018 11:03 AM by Rajat Rao Go to original post
      • 15. Re: How mechanism of incremental refresh works?
        amaryllis floweret

        If i point the dashboard to the 'incremental'  version, than i will have only the first observation of any date entry ! and this is the basic problem here , as i need those to get updated ...

        Sorry if the solutions seems obvious but i am not sure i understand your point ..i already looked at Refreshing Large Extracts Faster  but still doesnt solve my problem

        • 16. Re: How mechanism of incremental refresh works?
          amaryllis floweret

          My first question goes here with your first point in the "high level concept"

           

          1. Keep a static immutable copy of the data up through a certain date (i.e. 90 days ago).  Set it up to have the "incremental extract" option

           

          If i set up the immutable data to an incremental refersh than it will get mixed immutable and immutable data and this is the issue ...

           

          My current workflow is as follow :

           

          I have a table that i need get it s content daily ! I can not craete an extract in the table and refersh it increentally because the table got both inserts and updates

          Full refersh wont work because of the size

           

          The data in the table is at 85% historic data , which is flagged with one column as "historic" and the rest of the data is flagged in teh same column as "non historic"

          Only the non historic data can get updated and inserted : actually every new record is at frist place "non historic" . and then it gets multiple updates before it get finally updated to "historic" when it will never change again.

          At this time the newly "historic" should go to the immutable data ..

           

          Hope this is celar !

          • 17. Re: How mechanism of incremental refresh works?
            amaryllis floweret

            One more thing to mention , i tried before to embed a filter in the extract but when refreshed that filter got ignored and all data got pulled based on the date column i put in teh incremental refresh

            • 18. Re: How mechanism of incremental refresh works?
              amaryllis floweret

              Basically if the incremental refresh and the full refersh can be done WITH FILETRS, than my job is done : I can set and incremental refresh on ONLY "historic" data , and a full refersh on ONLY "Non Historic" data , and thsi wil get me a full up to date extract daily .... please confirm if thsi is doable ?

              • 19. Re: How mechanism of incremental refresh works?
                Yuriy Fal

                Hi Amaryllis ( Flower'et ? -- that's nice ;-)

                 

                There is no problem with adding your "historic" column to Extract Filters

                (not to the Datasource ones, since they're applied after the Extract ones).

                 

                But for the incremental refresh to work, you may want to have

                a special date (datetime) field, called it [Historic Timestamp], in your datasource.

                The business rules for this field should be as follows:

                 

                1) When the row is being mutable, the value of the [Historic Timestamp]

                should be (and remain) any other than the actual Timestamp --

                whether it is NULL, epoch, or a distant future.

                 

                2) When the row becomes immutable ("historic"), then the [Historic Timestamp]

                is updated with the current value -- via CURRENT_TIMESTAMP() or such.

                 

                3) The [Historic_Timestamp] should be used as an increment.

                 

                4) The Extract Filter (a different Row-Level calculated field) shoud be set

                to exclude the single distinct value of the [Historic_Timestamp] --

                the one which marked rows as mutable. Simply, this filter excludes NULL,

                or EPOCH datetime, or the distant future datetime (defined on Step 1).

                 

                That's the way one could get a "historic" extract refreshed incrementally.

                 

                As for the "non-historic" extract (fully refreshed), the combination

                of the Extract Filters should be exactly the opposite than that

                defined for the "historic" one.

                 

                Having that (dividing your datasource into two non-ambiguious parts),

                one could use the procedure of appending the "fresh historic" extract

                onto the "fresh non-historic" one. This could be done in Tableau Desktop.

                I've recently posted about that in yet another thread:

                 

                Re: Partial Refresh of Tableau Data Extract

                 

                Whether it could be done on Tableau Server (via scripting), dunno.

                May be Jeff Strauss would have an idea.

                 

                Yours,

                Yuri

                • 20. Re: How mechanism of incremental refresh works?
                  Yuriy Fal

                  Just made my homework (RTFMed actually),

                  and it looks like the appending is possible via REST API:

                   

                  http://onlinehelp.tableau.com/current/api/rest_api/en-us/help.htm#REST/rest_api_concepts_publish.htm#appending

                   

                  Yours,

                  Yuri

                  • 21. Re: How mechanism of incremental refresh works?
                    amaryllis floweret

                    One additional note : I really need all thsi to eb done automatically with tableau server. The end users wont consider playing with desktop ...

                     

                    Can anyone clarify what this "extract filter is " ? I tried putting a filter when editing an extract before extracting : add filter in the edit extract popup, just above the aggregation section ..But that filter got ignored when refersh , as it pulled all data based on the incremenatl field ( no matter what the value of the filter column is )

                    • 22. Re: How mechanism of incremental refresh works?
                      Yuriy Fal

                      Hi Amaryllis,

                      Can anyone clarify what this "extract filter is " ?

                      These are the filters that applied when creating / refreshing an extract.

                      You've pointed to the right place in the Extract dialog (above the agg section),

                      where you can create / modify / delete Extract filters.

                       

                      They may be added initially (copied from the Datasource filters,

                      if you've defined ones before opening the Extract dialog the first time).

                      But you may want to revise / update them anyway -- before hitting the Extract button.

                       

                      I have an occasional experience with an Extract filter to be "lost"

                      (removed automatically by Tableau Desktop upon the next opening),

                      but I could not trace those cases as following any pattern. They're rare.

                       

                      What I definitely recommend is checking / inspecting

                      the Extract filters already in place -- everytime one opens

                      the Extract Data... dialog in Tableau Desktop.

                       

                      As of my suggestions above in the thread,

                      I am actually using Extract filters in production

                      without any glitch (so far :-). The case is like this:

                       

                      Web events are coming from logs (website & apps)

                      into the event_stream table in Vertica (in micro-batches).

                       

                      1) There is the [ts] timestamp field, used as the increment.

                       

                      2) An Extract filter is set using the calculated field:

                      DATETRUNC('hour',[ts]) < DATETRUNC('hour',NOW())

                       

                      3) Extract is set to be refreshed incrementally every hour.

                      So upon the next refresh the previous (full) hour of events

                      become available because of the Extract filter settings.

                      And the maximum value of the [ts] field would be obtained

                      as the increment for the next refresh. As you can see,

                      this MAX([ts]) value would be something

                      below the current hour, but not above that.

                       

                      Doing incremental refresh this way, we have a reasonable guarantee 

                      that no one row (event) could be thrown because of any delays

                      in our log processing pipeline (at least within the last hour).

                       

                      Hope this could help understanding.

                       

                      Yours,

                      Yuri

                      • 23. Re: How mechanism of incremental refresh works?
                        Rajat Rao

                        Hey Jeff Strauss,

                        I am trying the one in the blog link you shared, i had a few concerns, its a pretty old blog not sure if I will get replies over there - thought I will try my luck here. Did you get a chance to blog about it as well? please do share you inputs if you have.

                         

                        With regards to the link you shared, is it mandatory to use the script to perform the copy of mutable and then executing incremental refresh and then publish sequence of steps ?  (i reckon it probably is), but can I do this without the tabcmd and publish it normally through the web interface instead of the script? I fear if I go the script route  I might loose my schedules 9 (like you mentioned in the comments section on the blog, do you have a workaround for that ? did you get it to work for you ? )

                         

                        Thanks in advance!

                        - Raj

                        • 24. Re: How mechanism of incremental refresh works?
                          Jeff Strauss

                          Yep.  It's working and it's fully automated and it's proved to be of value for the past 3 years.  Though to be quite honest, it's been a pain to maintain and it's even more difficult to fully teach because the logic is a bit like the "moving shell" game.

                           

                          What release of Tableau are you running?  Right now we're on 10.3.3 on our prod box, and once we upgrade to 10.5.x, then this script is on the list to deprecate as Hyper can handle big data in a very fast way.

                          • 25. Re: How mechanism of incremental refresh works?
                            Rajat Rao

                            Thanks for that quick response Jeff Strauss

                             

                            I am running Tableau 10.5. I haven't fully tried to do this yet, but reading through that blog I did understand the concept (it took me a while ) . The concern that I have is I usually publish my data extracts through the web interface and not scripts. do I have go through the script route for this technique to work ?

                            • 26. Re: How mechanism of incremental refresh works?
                              Jeff Strauss

                              The concept as I term it is "BDI" or "Back Dated Incrementals".  The script route permits an initial manual setup of an immutable extract as of 90 days ago, which is then used as the starting point for the script / nightly extract.  The comcept allows for late arriving data while mixing in incremental refreshes so we don't have to pull back to the beginning of the historical data.  In order to do the refreshextract though, the script does a publish overlay to the incremental extract (which the report points at).  So, in short, I guess the answer is yes, it requires the publish to be part of the script.

                              • 27. Re: How mechanism of incremental refresh works?
                                Rajat Rao

                                Ah, well, this might not work for me then!

                                 

                                I am trying to stick to incremental extracts instead of full, but the challenge I have is my customers tend to update data every now and then, incrementals in that case would create duplicates if I have the incremental key on "created_date" or "updated_date" kind of a column. For now I do full extracts twice in day (which I don't want to but the customers want near real-time data and I don't want to slow their dashboard experience by keeping the connection to LIVE data) and also I am having to restrict my data pull to fetch only for the last year (if I go all dates until 2015 the extract refreshes fail most of the times due to timeouts), been researching for a workaround for a while now, do you happen to know a workaround that would help me achieve this (this blog trick would have been awesome if weren't for the scripts bit) ?

                                 

                                and with regards to Hyper extracts are they really making a difference for you (my extract creation time are more or less the same) ?

                                 

                                Thanks again for you help!

                                • 28. Re: How mechanism of incremental refresh works?
                                  Jeff Strauss

                                  In reference to Hyper speed, the standard Tableau response is that the extract refresh speed depends on a variety of factors.  I haven't seen any definitive white paper yet as to what these factors are.  But, in any case, my org isn't on 10.5 yet, but we are hoping to get there by mid-April.  I did do some testing within beta with a couple of our large extracts, and found about a 30 to 40% gain.  One key to speeding it up, is that Hyper extracts do not need to have post-processing (i.e. optimize) after the import step is complete.  I've heard from others that Hyper is blazingly fast, but I'm not there yet.

                                   

                                  In terms of your challenge, can you just import the duplicate records with an incremental extract, and then within the report filter down to the ones that have the max update date?

                                  • 29. Re: How mechanism of incremental refresh works?
                                    Rajat Rao

                                    Yes. I did think of filtering the results in my report with the max date, but that would mean adding another one of many calculations within the report, trying hard to avoid that (want to keep calculations/computations on the DW/SQL end as much as possible )

                                     

                                    with regards to Hyper extracts, you said "Hyper extracts do not need to have post-processing (i.e. optimize) after the import step is complete" . I am not sure I understand this, could you please elaborate? I am fairly new to Tableau.