Hey Jonathan -
Can you provide a sample Excel workbook (or text files) with small number of rows for each "table"? Then show us what the final result should look like in your mind once these rows are joined...
Based on what you've described it sounds like the JOIN is set up wrong - what you're trying to do is a core competency of the product - so you should be able to do this lickety-split unless there's something weird going on
The table I posted illustrates the problem. Given a table with two customers and a table with three orders, the expected Sum(Customers) is two - if the app calculates metrics separately for each table. If the "multiple tables" data source is effectively performing a SQL-style join, whereby fields are populated for each matching record, the total will be three.
Attached. In this sample, I've created a third customer with no orders, but as you can see, Tableau does not include it. This seems to confirm that Tableau is performing a naive join of the tables.
Based on the type of join you create (Inner, Left, Right) you might get 2 or 3 rows back. You're using in inner join, which returns a (correct) result of two rows.
I believe you may have overlooked the "Join Type" drop-down box when adding your tables. It allows you to tell Tableau what sort of join to use. The default is an inner join, which you refer to as a "Naive" join:
Result (correct for an inner join):
It sounds like you simply need a LEFT join:
...which gets you (correct for an left join):
The issue is not whether Tableau performs a semi-join or a full join, the issue is that it's joining and then calculating on the resulting flattened data, thereby over-counting. Switching to a left join would eliminate under-counting, but does not solve the problem.
The results in your last screen shot are only correct for the count of orders. If you're summing customers, as I am, it's not: by definition, each customer is one customer.
So if I understand correctly, you are relying taking SUMs of a "counter" field ("Customers" in table Customers and "Orders" in table orders) in order to understand how many customers and/or orders you have? That approach isn't necessary in Tableau.
- Use a LEFT join - you must do so in order to see customers with no orders
- Take Distinct Counts of [CustomerID] and [OrdersID]
- Same as #1, but create distinct calculated fields which do a COUNTD() against each field:
- Don't join at all. Bring in each table as distinct data source.
- Count using your "Sum technique"
- Use Tableau Data Blending if you need to relate the two data sources now and then.
Hope this helps!
Yes, though this is just a simple example. The real data has far more, and more complicated, metrics.
- This works for IDs, but not for more complex metrics. For example, customer lifetime revenue may legitimately be the same for two customers, so a sum-distinct would result in undercount.
- Same problem.
- That was actually my first approach, and it's beautifully simple. However, this gave me out-of-memory issues with real data sets (millions of records), and per Tableau support, blending is limited to 300 K unique values in the join fields. So no go.
It seems that Tableau is very focused on examining one thing at a time, or at least one grain at a time. It's frustrating because the blending mechanic uses exactly the per-table calculations I'm used to in QlikView; is there really no way to use the same with joins?
1 of 1 people found this helpful
Have you investigated Tableau's RAW_SQL functions? They allow you to fire queries directly at the server vs. using Tableau to figure things out.
You could use variations of RAW_SQL to get your counts "in a join-less vacuum", and then join which ever way seems to make the most sense (Counting issues aside).
...Note that these functions are only available when you're directly connected to the data source since the data source in question will be doing the heavy lifting.
Right now I have set up a bunch of TDEs for people to read from, but granting rights for direct SQL access is an option if it's necessary. It wouldn't be my first choice, for security and performance reasons.
Interesting. That looks very powerful, I'm definitely going to have to keep that in mind for my own use. I'm concerned about the less-technical people who will be using Tableau, though. I can't ask them to write SQL code for this kind of simple problem!
Thanks for the suggestions, Russell. It sounds like there are some workarounds which could be fine for analysts, but I don't know that they'll be acceptable for line-of-business users. That's our main goal with Tableau: put data and tools in the hands of people who are comfortable with Excel but don't have the time or interest to learn the intricacies of QlikView, our current analysis product.