4 of 4 people found this helpful
- If a Data Source filter is in place before you create an extract, it'll be applied to the extract as well - your extract will contain "less" data.
- If no Data Source filter was applied when you create an extract, then:
- Any filter you put on the EXTRACT is applied during the extract process (your extract will contain "less" data)
- Any Data Source filter that you add after the extract is complete is applied against the extract, just as it were any other data source. Your extract will contain "more" data, but will be filtered to show "less" data by the Data Source filter
- Filters applied to the extract are honored during refreshes - either "full", or "incremental".
Hope this helps!
2 of 2 people found this helpful
Ken Hunt wrote:
When I choose Extract Data, the pop up allows me to add filters there too - but it does not automatically pull in the filters I created on the data source itself by using the "Edit Data Source Filters" option.
I just tried this and the filters were pulled through from the "Edit Data Source Filters". You should see them under the "Filters (Optional)" heading.
Thanks for the quick response Russell, I think the confusion was that I had already created an extract and the filters didn't seem to be applied consistency.
2 of 2 people found this helpful
Data source filters always apply to the data source, regardless of whether or not it's powered by an extract or a live connection. If you create data source filters after creating an extract, you can switch back to a live connection and the filters will still be honored. This means that data source filters do not actually affect the extract creation itself, which contradicts Russell's claim. If you wish to filter your extract, you must define filters within the extract creation dialog. However, I don't recall if Tableau automatically recommends using the data source filters as extract filters when you create an extract for the first time, which may have led to Russell's impression of the intended functionality.
The reason that data source filters are applied after the extract is created is that it allows extracts to contain a broad range of data while the data source filters (driven perhaps by parameters or by User Filters) only expose a subset of the data at a time. This way, Tableau can update a visualization without refreshing the extract when reacting to changes to your parameters or differences in users or groups connected via Tableau Server or Data Server.
Does this make sense?
I can see where the confusion was, and all 3 of you who have replied have helped me to better understand. To respond directly to your comments Robert, I found what you said to be true. Funny thing was when I had existing data source filters, and went to run the extract, 4 of the 5 lines of criteria pre-populated from the Data Source Filters to the Extract Data screen. It's a good thing I looked at them closely, because the one that was missing was based on the Createdate field in my data, and is responsible for greatly reducing my results.
I feel like I have a much better understanding of this, and I've already started sharing this with my co-workers. I do have one more question though - Lets say I don't use filters on my extract and have 20M records. Then I use criteria via the Data Source Filters to restrict the records to 1M. In this scenario, when I publish to the server, how does the data source filters provide a performance advantage?
Do the Data Source Filters apply the first time (in a session) a dashboard is brought up? When users change criteria via quick filters, is it accessing a cached record set based on data source filters, meaning it only queries the 1 million records not the entire data extract of 20M?
I know I have a lot of questions, but I'm sort of obsessed with understanding how things work, and we need to understand this clearly to make the most efficient, best practice processes for our organization.
Extract filters are the only mechanism for controlling the size of the extract. Read on to see what service data source filters provide.
When you publish a data source to Tableau Server, the data source and any associated files or extracts are transported in entirety to the Server. As you publish a data source you can define access permissions for downloading or modifying the data source, and you can also choose the users and groups who can remotely issue queries through Tableau Server with that data source. What's interesting is when users have query permission and no download permission. This allows you to share a rich data model having calculated fields, aliases, groups, sets and more -- but only for querying.
Furthermore, users who query this Data Server data source will never be able to see or modify any data source filters present on the originally published data source, but all of the users' queries will be subject to those data source filters. This is a great way to offer a restricted subset of your data, for example by filtering dimensions for specific users and groups, or by defining data source filters based on a fixed or relative date range. This is often useful for data security, and it also allows you to manage performance of the remote database which Tableau Server will ultimately query on a user's behalf. For systems which rely heavily on partitions or indexing, data source filters may yield tremendous control over the performance of queries issued by Tableau.
Does this make sense?
I have two Data Source Filters.
1. For a Segment
2. For Products
As soon as I choose the segment in the Data Source Filter and move over to apply a data source filter to the products, I see tableau already unchecking a few products which may not be a part of that segment as on date.
In such a case when the Data gets refreshed , will Tableau automatically update the Data Source Filter of the Products, if some of the existing products get added to a segment which they were not a part of originally ?
No, Data Source Filters are not directly applied to the data that is extracted. Refreshing the extract will yield new data that may appear in your visualizations. After the extract operation has completed and is ready to be queried, each query Tableau generates will include the data source filters as part of the query. If the new data in your extract is not subject to these filters, the new data will indeed appear in (or indirectly impact) your visualizations.
Here's some more background.
Data Source Filters apply directly to the data source, and are not a part of the underlying connection you have active. When you create an Extract you can define filters to apply when Tableau queries the live connection for data to extract. Those extract filters are distinct from any data source filters already defined.
Sometimes you may wish to apply the same data source filter criteria when extracting data (e.g. filters that eliminate dirty data), and to that effect Tableau may offer the data source filters as recommended extract filters the first time you create an extract. They are completely optional.
In other cases you may have data source filters which are very dynamic, and may change due to changes in the underlying data, changes in the Server user who is accessing the data, or changes in Parameter controls that influence the filter criteria. Examples include: relative-date filters to only show past six weeks of data; User filters; and Top N filters driven by a parameter control for [N]. In such cases you will want the extract to contain all data necessary to support the different ways in which these filters restrict the data.
I hope this helps,
I was hopefully of the same but was not sure.
I was not sure as to how Tableau distinguishes between
a. Data source filtered products due to data source filtering of segments
b. products specifically data source filtered by the user (by unchecking it)
Both are unchecked in the particular input box and look no different.
On Mon, Sep 23, 2013 at 3:37 AM, Robert Morton <
I have a question related to the performance of data source filters vs. global sheet filters when applied to a published datasource extract.
I have a large extract published on the server (34MM records). We had a global filter on [Business Unit] on a dashboard and some of the worksheets were taking very long to render.
We switched to a data source filter using a parameter to control it via a formula:
[Business Unit] = [Business Unit Parameter] OR [Business Unit Parameter] = "All"
I'd like to understand why something like this would improve the performance, even when the parameter is set to "All". I would think the same amount of data is returned in both cases and I would expect Tableau to benefit from the extract performance features regardless of the kind of filter used.
There are are a couple of possibilities that could explain the performance difference.
- Quick filters that are set to "All Values in Database" (vs. "Only Relevant Values") will display a domain of filter members that does not take into account any other filters on the Filters shelf. This may require Tableau to issue queries against the full data set, which cover far more data than you would like.
- Quick filters that are set to "Only Relevant Values" will be updated any time another filter is changed, since the relevant values may change as well. Since this can lead to numerous queries every time a filter is changed, Tableau will attempt to optimize this interaction by computing the relationships of all filters with each other. This is an up-front operation that can become expensive, but improves performance of user interactions with quick filters after the optimization is complete. The more filters you have in play, the more expensive this can become.
Data Source Filters cannot be displayed on a dashboard or visualization the way quick filters can, so they aren't subject to either of the concerns above. Furthermore data source filters will ensure that all queries will honor the filter criteria; this even applies to quick filters set to "All Values in Database", and will help ensure that those filters do not issue unbounded queries against the entire underlying data set.
So not only are Data Source Filters better than global filters (placed on the Filters shelf) for ensuring all database queries are constrained to a subset of data, using Data Source Filters can often improve the performance of other filters on the Filters shelf, especially when they are displayed as Quick Filters for interactivity.
I hope this helps,
This makes sense.
What would be the best way for me to check/confirm that the additional time it takes to create the view when no data source filter is used is due to the extra quick filter queries?
Is there an easy way to identify these quick filter queries in the log file or via the performance recording feature?
The queries appear truncated when viewed through the performance recording workbook. It's hard to tell if a query is related to a quick filter or not.
I don't know of an easy / quick way for you to identify the purpose of each query.
I think this issue is happening to me and I want to make sure i understand the best way to handle. Is it best to extract first, then apply data source filter before publishing extract OR connect, apply data source filter then extract? I have an issue where I apply the data source filter then extract. Looked fine on desktop, posted to server and refreshed, came up blank. We are using Tableau groups as row level security and a "keeps true" if the row meets some criteria.