My first thought, not to be rude, is why bother? Do you have data that will really be revealed better using one? Do you need the accuracy provided by plotting it, or would powerpoint suffice, especially as I've only seen Venn's as indication of sets, not where the amount of overlap signifies anything else.
If you really needed one on a Tab dashboard, I would consider building a URL in tableau and sending it out to Google Charts:
I understand the objections to Venn diagrams. They aren't great at helping you actually compare the size of the overlaps. However they ARE good for helping you understand which numbers actually go with which portion of the overlap, especially when there's more than two things being overlapped. Without a Venn, you are left with a bunch of complicated and unintuitive labels: A only, B only, C only, A&B, A&C, B&C, A&B&C! And that's only a 3 way overlap. We are more concerned with quickly identifying which things overlap what, than we are about figuring out whether A&B may overlap set so slightly more than B&C, and so on.
The google charts thing is a good suggestion. However our data must remain inside our firewalls.
Not wedded to Venn diagrams though. That is simply what we have used in the past. If anyone has a suggestion about a built on Tableau chart type that would represent the overlapping intuitively, that would be fantastic.
How is your data structured?
I'm thinking a table calculation to derive your complicated and unintuitive labels (!) and then putting those labels on the colour shelf and playing with all the chart types to see what jumps out of the page at you most effectively.
Post some simple sample data and we can have little contest to see who can be most creative. Joe's been very quiet lately so some of the rest of us might stand a chance. ;-)
Bring it Richard...
Awesome. Well the schema we are creating has to be able to support arbitrary levels of overlaps (2 way, 3 way ... n way). So I have come up with a schema that allows us to store this in a generic way using on 'overlap index' stored as a bitmap, where the status of each bit represents a particular dimension value. In the simplest example of a 2 way overlap, you'd have dimension table with A having an overlap index of 10, B with 01. Then the fact table has values for 10, 01, and 11, each with a corresponding measure value containing the number exclusive to that element of the overlap. This can be translated to each portion of the overlap like so:
A exclusive = 10
B exclusive = 01
A&B = 11
Total for A = 10 + 11
Total for B = 01 + 11
Total = 10 + 01 + 11
If you have a db that can do bitwise operations, this then allows you to join the fact table against the dimension table in such a way that it allows you to do the aggregations shown above for overlaps containing arbitrary numbers of elements.
Here is some made up demo data:
create table toms.overlap_index (
overlap_index bit varying
create table toms.overlap_metrics (
overlap_index bit varying,
insert into toms.overlap_index values
('Car Manufacturers Overlap (t123456)', 'Mazda', B'1000'),
('Car Manufacturers Overlap (t123456)', 'Honda', B'0100'),
('Car Manufacturers Overlap (t123456)', 'Toyota', B'0010'),
('Car Manufacturers Overlap (t123456)', 'Mitsubishi', B'0001');
insert into toms.overlap_metrics values
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1000', 182),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0100', 3416),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0010', 9228),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0001', 9358),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1100', 8506),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1010', 8472),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1001', 5345),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0110', 6460),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0101', 5305),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0011', 7082),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1110', 1233),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1011', 1732),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'0111', 2966),
('Weekly', '20100101', 'Car Manufacturers Overlap (t123456)', B'1111', 4588);
I've attached a demo workbook that includes the data above as extracts. It also contains some sql showing how you can use the data above to pull out the 2 way and 3 way overlaps. If you have changes to the schema that would make things simpler in Tableau, I'm all ears, with the caveat that it has to support arbitrary numbers of overlap elements.
I am thinking perhaps a 'tornado' chart style thing could work well, at least in the case of the 2 way overlap. Anyway, let the challenge begin!
overlap_demo.twbx 127.5 KB
This looks like being a bit of fun.
I had a quick go at distributing some blobs semi-randomly, as attached. I feel sure that there will be a cunning way either to organise the blob placement or perhaps to join the blobs together (or even both) based on common set membership, using a dual axis chart, or some such. At the moment there's no meaning to the placement so it really doesn't convey even as much meaning as your raw data. I thought I'd share what I'd got as a starter - it can only get better from here.
overlap_demo_rl1.twbx 132.9 KB
Wow, I have no clue what you have even done here, but it looks promising!
Here is my first manual pass at an option for a final display. I could see all sorts of options for interactivity to change sort orders and filtering.
The way it would read is for example:
There are 30,058 Mazda and 43,815 non-Mazda. There are 14,327 Mazda-Honda combos, 15,731 Mazda-non-Honda combos, 18,147 non-Mazda-Hondas, and 15,668 neither Mazda nor Honda. etc...
I felt that the negative space could be as important as the positive space. A parameter could turn the white-space labeling on and off, and parameter to change the order of the sections would useful. Additionally, changing the order of the white-space colored areas may be another option to consider.
I like the potential this has to scale to n-sections.
What do you think?
alt_to_venn.twbx 21.0 KB
after walking away and looking at it for a bit, I like that is is fairly easy to see the outlyers, that there are no Mazda-Honda-Mitsubishi (1101) combos, and visibly very little Mazda-only.
I think this can done with a template-of-sorts data source, and a data blend with your data as it is structured currently, with the relationship on your flag (0101 like values) field. That way you can adapt to other n-ways data records, without having to recreate a separate workbook for each. This would require some trial and err and test to be sure, but I think it would work out well.
Wow Joe, that is pretty amazing. Gonna take me a while to get my head around it...
Joe, how did you create the datasource that you are using?
What I did was to organise the marks on concentric circles - with the mark for the set with all four makes in the middle, then the sets of three in the first circle, then sets of 2 and finally the individual marks with no overlaps. I was trying to come up with a way to distribute or link the marks to show overlaps in some way - or at least space the marks out evenly on each circle, but I ran out of steam. Not sure it's really going anywhere...
I had a go at venn diagrams myself (due to popular demand). You can find it on our blog here:
It's only a 2 way at the moment but if I get some time I'll update it to a 3 way and add a building guide.