Your question is a good one for the next Open Tableau Table Calculation World Championship in November 2016 in Austin TX. To filter out amateurs like me, every challenger must first solve your question to qualify for the championship.
Man - yet another thing I did not know existed: the championship
I'm still getting surprised by how vivid the tableau community is!
Here are two solutions, both based on an earlier post of mine at Counting Pairwise Similar Votes in Tableau | Drawing with Numbers.
I haven't been able to find an unmodified single-source solution to this. The reason is that since densification is adding marks then calcs like SIZE() don't work anymore (because they count densified marks) and regular aggregates like COUNT() and MIN() don't work because they only work on non-aggregated data.
The join solution uses a self-join on Y to effectively pad out the data, then an index calc to trigger domain completion and ZN(SUM(Number of Records))) to get the results.
The blend solution uses a INDEX() table calculation with a compute using on X (copy) & Y to trigger domain completion (this is a simplified setup from what I'd done in the post since I've learned more about densification since then), then a self blend on X & Y and a table calc with the formula IF FIRST()=0 THEN ZN(WINDOW_SUM(SUM([Number of Records]))) END and a compute using on Y to count the number of records.
Workbook is attached, let me know if you have any questions!
common UIDs_9.3 jtd.twbx 33.9 KB
Thank you Jonathan and Yuri for your answers.
I'm more convinced now that probably it is not possible with unchanged source and table calculations only. But what still seems vague to me is this:
- Is densification taking place in my setup (24 marks)?
- Why the densification is complete across all the cells here (50 marks)?
- My understanding is the following: The underlying data consists of 10 rows and real data lies on the diagonal only. The first addressing is partitioned by [Y] so it is padding every potential combination of X&X per Y hence marks are displayed only in cells where Y exists in both [X] & [X copy]. The second just does not partition so it densifies to all potential combination of X & (XY pair) even if such XY pair/triple does not exist.
- Is densification taking place in my setup (24 marks)?
3 of 3 people found this helpful
@Yuriy - thanks for the catch!
You didn't specify the addressing of the table calculations so I can't say exactly what's going on for your #1 and #2, however I can replicate those views with various table calculation settings and describe those results, if yours are different then let me know and I'll explain.
The start is that there are 10 records in the data with 10 unique combinations of X & Y, so when we have X and Y in the view we get 10 marks.
Given the pill layout a compute using for the Index calc on a) X, b) X (copy), c) Table (Across), or d) Table (Down) will all result in the same densification out to 24 marks. No matter what the Index calc is set to it's addressing on one of the two X dimensions and partitioning on the combination of Y and the other X dimension, here's an example:
There are 24 marks because Tableau is effectively padding out all combinations of X & X (copy) for each Y, this results in their being values of X, X (copy) being padded in that don't exist in underlying data.
In the second view I can get those results with 50 marks by setting the Compute using to an Advanced compute using on one of the X dimensions *and* Y, so it partitions on the remaining X dimensions:
So in this case as you surmised Tableau is padding out all combinations of X & X (copy) & Y, leading to 50 marks.
In both cases the underlying trigger for densification is in two parts:
1) There are discrete dimensions on Rows & Columns.
2) There's a table calculation that is addressing on something other than none/all the dimensions on Rows and Columns.
Joe Mako and I call this "crosstab domain completion" because of the 1st requirement. If we rearrange the pills such that we don't have opposing discrete dimensions on Rows & Columns then this domain completion goes away.
You're right. I think it was the first time I came across such dimension arrangement where data densification is used and it was a little bit surprising to me.
I am not sure if it always was like this but am wondering if it is necessary that all the table calculations share the same domain. If they were independent I could use the addressing resulting with 50 marks for displaying my calculation addressed to 24 marks where size() is what I want. But now the situation is that once the domain completion is triggered - regardless of dependency order in nested calculations or if they are separate - the domain is densified to the highest level prior to evaluating any of the table calculations.
Or maybe I am talking nonsense now... (it is Saturday night )
As Tableau currently exists all densification is completed before data blending which is completed before table calculations are computed, so yes, all table calculations share the same domain.
While enabling multiple sets of domains of densification would be one route to get to your desired result, what I've asked Tableau to do is to a) first make what it's doing for densification more obvious and b) enable us to do the densification further upstream in the data source so that way we can use Tableau's already built-in downstream functionality. For example if I could tell Tableau to pad out every combination of X and X (copy) as a record-level value then filling it with 0s could be done using Tableau's Format->(measure)->Special Values->Text functionality or just using COUNT([Y]).
For example if I could tell Tableau to pad out every combination of X and X (copy) as a record-level value
Would it mean "add some rows to the datasource on the fly"?
To be explicitly clear, I think of this feature request as “add rows to the Tableau data source on the fly” and not “add rows to the underlying data source on the fly."
I was wrong - it can be done
Data source is unchanged though method is different. Densification is full (125 marks) and count of common [Y] is looked up as simple size() cannot be used anymore.
Common UIDs ZN 9.3.twbx 10.0 KB
I do not like to do that but had to mark my last answer correct as in fact it fulfills the goal. The valuable input from Yuri and Jonathan are very much appreciated!
What is interesting about this solution is that in order to have data densification done to all potential combinations of the dimension values (5x5x5) you have to use two table calculations addressed properly (here 2x index()).
1 of 1 people found this helpful
Lukasz, you've came with the right answer,
so please, don't hesitate about this :-)
Your solution is a viable alternative to a common (self-join) approach
of counting Bins when starting a Market Basket analysis.
Self-join performance depends greatly on the underlying DBMS of choice,
and it is not an option with extract-based datasources (as of now, but hope that
Tamas Foldi could tell some).