If i point the dashboard to the 'incremental' version, than i will have only the first observation of any date entry ! and this is the basic problem here , as i need those to get updated ...
Sorry if the solutions seems obvious but i am not sure i understand your point ..i already looked at Refreshing Large Extracts Faster but still doesnt solve my problem
My first question goes here with your first point in the "high level concept"
1. Keep a static immutable copy of the data up through a certain date (i.e. 90 days ago). Set it up to have the "incremental extract" option
If i set up the immutable data to an incremental refersh than it will get mixed immutable and immutable data and this is the issue ...
My current workflow is as follow :
I have a table that i need get it s content daily ! I can not craete an extract in the table and refersh it increentally because the table got both inserts and updates
Full refersh wont work because of the size
The data in the table is at 85% historic data , which is flagged with one column as "historic" and the rest of the data is flagged in teh same column as "non historic"
Only the non historic data can get updated and inserted : actually every new record is at frist place "non historic" . and then it gets multiple updates before it get finally updated to "historic" when it will never change again.
At this time the newly "historic" should go to the immutable data ..
Hope this is celar !
One more thing to mention , i tried before to embed a filter in the extract but when refreshed that filter got ignored and all data got pulled based on the date column i put in teh incremental refresh
Basically if the incremental refresh and the full refersh can be done WITH FILETRS, than my job is done : I can set and incremental refresh on ONLY "historic" data , and a full refersh on ONLY "Non Historic" data , and thsi wil get me a full up to date extract daily .... please confirm if thsi is doable ?
Hi Amaryllis ( Flower'et ? -- that's nice ;-)
There is no problem with adding your "historic" column to Extract Filters
(not to the Datasource ones, since they're applied after the Extract ones).
But for the incremental refresh to work, you may want to have
a special date (datetime) field, called it [Historic Timestamp], in your datasource.
The business rules for this field should be as follows:
1) When the row is being mutable, the value of the [Historic Timestamp]
should be (and remain) any other than the actual Timestamp --
whether it is NULL, epoch, or a distant future.
2) When the row becomes immutable ("historic"), then the [Historic Timestamp]
is updated with the current value -- via CURRENT_TIMESTAMP() or such.
3) The [Historic_Timestamp] should be used as an increment.
4) The Extract Filter (a different Row-Level calculated field) shoud be set
to exclude the single distinct value of the [Historic_Timestamp] --
the one which marked rows as mutable. Simply, this filter excludes NULL,
or EPOCH datetime, or the distant future datetime (defined on Step 1).
That's the way one could get a "historic" extract refreshed incrementally.
As for the "non-historic" extract (fully refreshed), the combination
of the Extract Filters should be exactly the opposite than that
defined for the "historic" one.
Having that (dividing your datasource into two non-ambiguious parts),
one could use the procedure of appending the "fresh historic" extract
onto the "fresh non-historic" one. This could be done in Tableau Desktop.
I've recently posted about that in yet another thread:
Whether it could be done on Tableau Server (via scripting), dunno.
May be Jeff Strauss would have an idea.
Just made my homework (RTFMed actually),
and it looks like the appending is possible via REST API:
One additional note : I really need all thsi to eb done automatically with tableau server. The end users wont consider playing with desktop ...
Can anyone clarify what this "extract filter is " ? I tried putting a filter when editing an extract before extracting : add filter in the edit extract popup, just above the aggregation section ..But that filter got ignored when refersh , as it pulled all data based on the incremenatl field ( no matter what the value of the filter column is )
Can anyone clarify what this "extract filter is " ?
These are the filters that applied when creating / refreshing an extract.
You've pointed to the right place in the Extract dialog (above the agg section),
where you can create / modify / delete Extract filters.
They may be added initially (copied from the Datasource filters,
if you've defined ones before opening the Extract dialog the first time).
But you may want to revise / update them anyway -- before hitting the Extract button.
I have an occasional experience with an Extract filter to be "lost"
(removed automatically by Tableau Desktop upon the next opening),
but I could not trace those cases as following any pattern. They're rare.
What I definitely recommend is checking / inspecting
the Extract filters already in place -- everytime one opens
the Extract Data... dialog in Tableau Desktop.
As of my suggestions above in the thread,
I am actually using Extract filters in production
without any glitch (so far :-). The case is like this:
Web events are coming from logs (website & apps)
into the event_stream table in Vertica (in micro-batches).
1) There is the [ts] timestamp field, used as the increment.
2) An Extract filter is set using the calculated field:
DATETRUNC('hour',[ts]) < DATETRUNC('hour',NOW())
3) Extract is set to be refreshed incrementally every hour.
So upon the next refresh the previous (full) hour of events
become available because of the Extract filter settings.
And the maximum value of the [ts] field would be obtained
as the increment for the next refresh. As you can see,
this MAX([ts]) value would be something
below the current hour, but not above that.
Doing incremental refresh this way, we have a reasonable guarantee
that no one row (event) could be thrown because of any delays
in our log processing pipeline (at least within the last hour).
Hope this could help understanding.
Hey Jeff Strauss,
I am trying the one in the blog link you shared, i had a few concerns, its a pretty old blog not sure if I will get replies over there - thought I will try my luck here. Did you get a chance to blog about it as well? please do share you inputs if you have.
With regards to the link you shared, is it mandatory to use the script to perform the copy of mutable and then executing incremental refresh and then publish sequence of steps ? (i reckon it probably is), but can I do this without the tabcmd and publish it normally through the web interface instead of the script? I fear if I go the script route I might loose my schedules 9 (like you mentioned in the comments section on the blog, do you have a workaround for that ? did you get it to work for you ? )
Thanks in advance!
Yep. It's working and it's fully automated and it's proved to be of value for the past 3 years. Though to be quite honest, it's been a pain to maintain and it's even more difficult to fully teach because the logic is a bit like the "moving shell" game.
What release of Tableau are you running? Right now we're on 10.3.3 on our prod box, and once we upgrade to 10.5.x, then this script is on the list to deprecate as Hyper can handle big data in a very fast way.
Thanks for that quick response Jeff Strauss
I am running Tableau 10.5. I haven't fully tried to do this yet, but reading through that blog I did understand the concept (it took me a while ) . The concern that I have is I usually publish my data extracts through the web interface and not scripts. do I have go through the script route for this technique to work ?
The concept as I term it is "BDI" or "Back Dated Incrementals". The script route permits an initial manual setup of an immutable extract as of 90 days ago, which is then used as the starting point for the script / nightly extract. The comcept allows for late arriving data while mixing in incremental refreshes so we don't have to pull back to the beginning of the historical data. In order to do the refreshextract though, the script does a publish overlay to the incremental extract (which the report points at). So, in short, I guess the answer is yes, it requires the publish to be part of the script.
Ah, well, this might not work for me then!
I am trying to stick to incremental extracts instead of full, but the challenge I have is my customers tend to update data every now and then, incrementals in that case would create duplicates if I have the incremental key on "created_date" or "updated_date" kind of a column. For now I do full extracts twice in day (which I don't want to but the customers want near real-time data and I don't want to slow their dashboard experience by keeping the connection to LIVE data) and also I am having to restrict my data pull to fetch only for the last year (if I go all dates until 2015 the extract refreshes fail most of the times due to timeouts), been researching for a workaround for a while now, do you happen to know a workaround that would help me achieve this (this blog trick would have been awesome if weren't for the scripts bit) ?
and with regards to Hyper extracts are they really making a difference for you (my extract creation time are more or less the same) ?
Thanks again for you help!
In reference to Hyper speed, the standard Tableau response is that the extract refresh speed depends on a variety of factors. I haven't seen any definitive white paper yet as to what these factors are. But, in any case, my org isn't on 10.5 yet, but we are hoping to get there by mid-April. I did do some testing within beta with a couple of our large extracts, and found about a 30 to 40% gain. One key to speeding it up, is that Hyper extracts do not need to have post-processing (i.e. optimize) after the import step is complete. I've heard from others that Hyper is blazingly fast, but I'm not there yet.
In terms of your challenge, can you just import the duplicate records with an incremental extract, and then within the report filter down to the ones that have the max update date?
Yes. I did think of filtering the results in my report with the max date, but that would mean adding another one of many calculations within the report, trying hard to avoid that (want to keep calculations/computations on the DW/SQL end as much as possible )
with regards to Hyper extracts, you said "Hyper extracts do not need to have post-processing (i.e. optimize) after the import step is complete" . I am not sure I understand this, could you please elaborate? I am fairly new to Tableau.