Discussions for use case or scenario are interesting.
Eric Viglotti wrote:
- We have a SQL server with, let's say, 1million customers with order details
- Someone then gives us a text file with a random list of customers, say, 100 to 200 customers, with their customer number that matches their customer number in the SQL database
- They want us to analyze those customers as a cohort or subset
I suppose "Someone" are your business users and this process looks quite traditional.
Under self-service context, it works like,
- Business users have the text file at their hands, deal with the data connection with the support/instructions of IT person,
- Business users create and share reports (here Tableau workbook)
What's your role in the scenario? DB admin or IT admin?
Do you use a Tableau Server or only Tableau Desktop?
Thank you Lei. Basically I am a DB admin and IT admin in this scenario as it's a small company and only using Tableau Desktop right now, but could look at Server if there are enough benefits.
I'm actually helping setup the tools as best as possible, so I'm helping a BI analyst who would be receiving the text file from a business user. We haven't fully set this up yet, but we can assume that the BI analyst who would be receiving the text file would have SQL access and would be a Tableau author.
Best practice differs from requirement to requirement, and from environment to environment.
Not sure if this fits your situation, one brief idea just came to me looks like,
- Introduce a new table (let me call it request_customer_list) in SQL Server, which contains mainly two columns: requester, customer number.
- In Tableau Desktop data connection setting, use request_customer_list left join business data tables, and filter by requester.
For your reference, one of the Tableau Server's advantages is User Filter.
Very good to know, historically this has been what I have done for a BI solution, it just never felt very "self service" because this BI analyst would be expected to have an ETL tool that would insert data into this table. I was thinking maybe Project Maestro would be a "self service" type ETL tool (sort of) but it looks like Project Maestro doesn't have capabilities to load into a table so we're stuck there
I'm just still scratching my head on what exactly "self service" BI is as I always though this type of use case would be a pretty common one but perhaps not...
There would be no problem to have a list of customers working as a Filter,
by copy-pasting the values into the Multiple Values (Custom List) Quick Filter:
But there is one thing that makes it particularly difficult to have this case
as 'self-service' (working via browser in Interact or Web Edit mode):
- They want us to analyze those customers as a cohort or subset
As per Tableau Help, when in Web Edit mode, one can
- Create and edit groups.
- Use sets (not create or edit).
So there's no way to make a Set from a list of Customers in Web Edit mode.
The other two alternatives -- either making a Group or writing a Calculated Field --
require user familiarity with Tableau Desktop 'not-so-basic' features / techniques.
Besides, making a Group would require a Customer dimension somewhere on a view --
which is exactly the opposite of what you're trying to get in the first place.
As a decent workaround you may want to build a two-sheet dashboard
having the same views side-by-side and using a pair of Quick Filters in Custom List mode --
the one to Include (the list of Customers), the other to Exclude.
Please find the attached as an example (using SuperStore, of course).
Customer_List_v10.0_YF.twbx 1.2 MB
Thanks so much Yuri for the example workbook and the detail. This makes complete sense and I see Gerardo echo'ed this concept as well. A few questions:
1. Is there a limit to how many customer numbers you can paste into the "Multiple Values (Custom List) Quick Filter"? Can you paste literally hundreds or thousands?
2. Do we know if adding this filter to the data source actually puts this in the "where" clause of the SQL being sent to the SQL server? Or is the SQL server being asked for all 1 million customers and then Tableau filters the results from there? I'm hoping it's not the latter as that would seem royally inefficient.
3. Lastly, Lei's latest post about the table and the "left outer join" because it's common that if someone gave us a list of 100 customer numbers, we would return 100 rows back in a final report, for example. Said another way, if customers are purged from the current database, that information would be important to know side by side and a left outer join would be the best way to do this. Ultimately it sounds like for this type of use case, the best solution is either to:
1. Do the left outer join in Tableau (or Maestro) and accept that asking SQL for all 1 million customers is part of the requirement for this.
2. Or do something in the data engineering/ETL layer outside the user's scope to get those records populated to a direct SQL table (like Lei was saying) using some other tool.
I'll take a crack at your first one and two:
1. So I just tried 10000 CustomerNames as data source filter. No problems here.
2. These are the queries that were generated:
SELECT TOP 0 Orders.CustomerName AS CustomerName
FROM [dbo].[Customer] [Customer]
CREATE INDEX [_tidx_#Tableau_3_3_Filter_1a] ON [#Tableau_3_#_Filter]
SELECT COLUMN_NAME , COLLATION_NAME, TABLE_NAME
WHERE TABLEA_NAME LIKE '#Tableau_3_3_Filter%'
SELECT [Customer].[CustomerName] AS [CustomerName]
FROM [dbo].[Customer] [Customer]
INNER JOIN [#Tableau_3_3_Filter] [Filter_1] ON (Customer.CustomerName) = [Filter_1].[CustomerName]
GROUP BY [Customer].[CustomerName] ORDER BY 1 ASC
SELECT [Customers].[CustomerName] AS [CustomerName]
FROM [dbo].[Customer] [Customer]
INNER JOIN [#Tableau_9_4_Filter] [Filter_1] ON ([Customer].][CustomerName]= [Filter_1]
GROUP BY [Customer].[CustomerName]
So in a custom value list it just shoves everything into a temp table then queries that. I hope that helps!
This is outstanding, thank you.
1. Sorry if this is a dumb question, but how do you view all of this SQL? Did you do this in Tableau or through something else like SQL Profiler or something analyzing the SQL commands sent to the server?
2. Interestingly, I would think that if you had a flat file with customer number and you did a join in the Data Source to a SQL query, wouldn't it do much the same concept with a temp table so it would be nice and fast like this? Or is this more of a feature request? It would be awesome if it recognized that you are trying to join external data to SQL data and clearly the best way to do that is to put the external data into a temp table, index that temp table and do a SQL join to the actual SQL table. That appears to be what you are showing above in the example of the custom value list. So I feel like we are close!!!
Glad you found that useful.
There's a couple of ways that you can view the SQL generated by Tableau. Here are the supported ways:
I usually find that using the Performance Recorder doesn't capture the full SQL statement. It's usually a truncated version especially when your doing lots of calculations. Parsing the log files is what I use to do. What's nice is that Tableau released a couple of tools to help do just that:
Using SQL Profiler is also a valid method. If that is what you are comfortable using then by all means use this route. I personally did all in Tableau the unsupported way. In Tableau Desktop 10.2 there is a hidden menu, DEBUG, that you can pass a a parameter to tableau.exe to make it available:
In the debug menu, under Query Failure, there is an option to prompt for each query. So each query that gets sent off you can view individually. This is my preferred method. Unfortunately, shortly after Tamas released this info, version 10.2.1, they removed that option (Tamas found a way to enable the debug button in latter versions but he wasn't allowed to disclose the how to ). So I keep a copy of 10.2 around just for this purpose. As of now, most queries from version 10.2 to 10.5 haven't changed much. I feel this method in the future will be invalid since Tableau is always improving the SQL it generates. If you so happen to use this method or the performance recording you'll want to remember to clear Tableau desktop cache.
2. Joining a flat file to a DB there is shadow extract that is generated in what is called a federated join. (If I'm using the wrong terminology please correct me!)
So the SQL that is created is actually Vizql. Tableau's own special sauce. It is similar to the programming language LISP.
I didn't capture all of it since it won't make much sense unless you study it for a bit. Here are the important parts:
(table [TableauTemp].[Clipboard_20180405T131631#txt]) --I use copy paste for the 10000 names so it adds it as clipboard datasource
([_CustomerName_LEN_RUS_S2_VWIN_0] [CustomerName]) -- The joins
([_CustomerName_LEN_RUS_S2_VWIN_0] [CustomerName] -- The joins
([_Last Nmae_LEN_RUS_S2_VWIN_0] [CusomterName])
create table [Temp].[#Tableau_12_.............]
create column [Temp].[#Tableau_12............]
INSERT INTO [Temp].[#Tableau_12............]
So to answer your question yes a flat file to DB does create a Temp Table.
Outstanding. So this is interesting...Maybe this was because I was running a previous version or something, but now when I run some tests, I'm getting some good results. Here's what I'm testing:
- SQL query against a table of hundreds of thousands of records with a data source filter yields 60K records and takes only a second or two which is outstanding.
- If I create an Excel file with 16K different customer numbers and do a simple inner join in Tableau, it returns the matching records in only about 4 seconds. So unlike my previous tests a few weeks ago, this is working great.
I am wondering though, if I do an inner join it is about 4 seconds. If I do a left outer join between SQL and the excel file, it is still very quick, only 8 seconds. However, if I do a right outer join where the right is the Excel, that takes absolutely forever (I cancelled after a minute). And it was even this slow if the Excel file had only 5 customer ID's in it.
In the end, not a huge deal, most of the time the idea is to only return results where there is a match, so an inner join is fine, but there are times where we need to provide the exact 5 customer ID's they gave us and whether we have a match or not and that's why the right outer join would be useful. Any ideas why it is THAT much slower with a right join (Excel on the right) vs a left join? Both are outer joins...
I'm not sure. I'm seeing similar results if the excel file is on the left and left outer join is made or if the excel file is on the right and right outer join. Same queries are being produced. Maybe some one can enlighten us both?
Either way why don't you build an extract? The initial creation of the extract might take a bit of time, hyper made this a lot faster, for a large data set but after that everything will be blazing fast.
Let's not worry too much about the right outer join, I did it with a different type of SQL query and join and it was totally fine on left, right or inner. Bit odd, but no worries.
As for the extract, this is interesting...I have a data source that has SQL data now joined to an external Excel file which takes about 4 seconds to generate the results for 60,000 records of which about 1,000 are a match. All is good when doing a live connection.
However, when I do the extract, it really really does take a long time to build it, but what's worse, it appears it is bringing in 100% of the SQL records, not just what I have in the filter. So whereas the live connection was a temp table type join from my 60,000 SQL records to 1,000 Excel records, this is now the full 1mil records, prior to the filter and then prior to the joins. At this point, I feel like Tableau is making it's own SQL database with 100% of what is in the actual SQL database Is that really what is intended do we think? It's really weird, why in the "Extract Data" dialog does it present the filters and yet it seems to be ignoring those filters and pulling 100% of the records from SQL?
I have to ask you what "really does take a long time to build it" means so I can have some perspective. I'm going to make the assumption that anything >30 seconds is a long time?
I just tested an excel file, 10k records, with a inner join and a left join to SQL db and creating the extract. Both only brought down 10k records. The left join took a bit more time than the right 1:37 to 2:15 mins. This is on Tableau 10.2. That will definately be faster in 10.5.
I also tried a data source filter. Now here is were I think things went amiss. So if you add a data source filter you'll have the following:
Now if you hit extract radio button right after you add the data source filter. You'll get the following:
Click on edit and the filter dialogue box will pop up just hit okay and it'll add the data source filter to the extract.
The verbiage from the previous screen shot should change to:
That should only extract the records you are requesting. If that isn't the case I'd be curious to what is going on because that isn't expected behavior. Also, the speed at a which an extract is created is dependent on several factors; network speed, the DB response time, RAM, CPU, and disk write speed. The beefier the machine ( RAM, # of Cores hard disk write speed) that is creating the extract the quicker the extract is created. A dirty example of my laptop versus a my DEV server is about half the time (122 seconds versus 66 seconds) to create an extract of ~1.8 million records.