4 Replies Latest reply on Jan 16, 2014 5:52 AM by juliette.mccormack

    Venn alternative using custom SQL


      Hi everyone


      I have 4 datasets (A&E, acute patient care, outpatients and community) and I have Patient IDs for each.  I would like to see which patients are in which datasets.  I have been using the method outlined by Jonathan Drummey in http://community.tableau.com/ideas/1582.  However, I'm struggling with getting the custom SQL code to work (mainly because I don't understand it).


      I would be very grateful if someone had the time to annotate the code to explain what each bit does so that I might be able to update it to fit my needs.  For example, I know that I need to have 4 headings (Jonathan had 3) so I need to add lines to the code to accommodate this.


      Many thanks.


      Best wishes



        • 1. Re: Venn alternative using custom SQL

          OK, I can tell no-one's keen to help me out on this so I'm trying another method.  This is one by Joe Mako and comes from http://community.tableau.com/thread/110430


          I have created the calculated field 'Service Type'.  But as you can see, I have a few 'Nulls' and I'm not sure why as I think I have all the different options listed in my Case statement.


          The second problem is that if a patient goes into A&E twice, will the SUM(Case) then make it 2000 and that won't be recognised?


          Any help gratefully received on either the SQL or Case methods.


          Many thanks.



          • 2. Re: Venn alternative using custom SQL

            I can work around the patient coming in twice or more by connecting to another dataset which aggregates the data on patient id and service type so I only get one record per combination.  On the real dataset (rather than the mockup in the previously attached workbook), this then works and produces this:

            Service Type Venn.jpg

            This says that, within my timeframe, there were approx 60k patients who only received A&E care, were 10k who had A&E and APC (inpatient) care, and 3k who had A&E, APC and CIDS (community) care, etc, etc.

            Service Type Venn Diagram.jpg(I had to export the tableau data to excel to get the exact totals for each bar)


            Even though this seems to work, if anyone has any ideas about the SQL or Case method issues highlighed in previous post, I would still be interested.


            Best wishes



            • 3. Re: Venn alternative using custom SQL
              Jonathan Drummey

              Hi Juliette,


              I can explain both of the workbooks you were looking at.


              The Custom SQL is an overcomplicated mess of unpivoting the data. Looking at it now, I wonder what I was thinking at the time! The solution is limited because it's using JET SQL, there are probably ways around this using more advanced data sources, or by doing transformation in an ETL tool. Anyways, the presumption is that the raw data is sorted in ascending order by Service Line. The first query up to the first UNION ALL gets patients who have 3 service lines. The next query (up to the next UNION ALL) uses that same SQL as part of getting patients with two service lines, and finally the third query (through the end) uses the code from that second query as part of getting patients with only one service line. Looking on it now, it's more complicated than it needs to be, if I redid this particular code I'd probably have three (or 4 in your case) subqueries to get the sets of patients with 1, 2, 3, or 4 service lines, and then have simpler queries attached to those subqueries.


              The solution Joe came up with a form of bitmap index, only using decimals. Each Category in the data is assigned to a number, and that number has a unique decimal place. In this case, there are three categories, so 100 for A, 10 for B, 1 for C. Then in the view the set of assignments is summed for each ID, so someone with A and C would get 100 + 1 = 101. That's the input number to the CASE statement, which then has a CASE 101 THEN "A_C".


              There are advantages and drawbacks to each solution. The nice part about the Custom SQL solution is that other than needing the sort it's completely dynamic and doesn't have to know the values are in the underlying data. Also, it ends up with a full-on dimension that can be used for addressing and partitioning of table calculations, as a filter action, etc. The downside is that all those queries can lead to low performance, and for JET sources with more than a few records almost certainly require an extract.


              The bitmap solution should be quite fast, and it's a lot easier to write a long CASE statement than to debug SQL code, especially JET SQL. However, it does require defining all the categories in the calculated field. The result is a discrete aggregate measure rather than a dimension, so that has limitations in how it can be used for further aggregations, for example we can't use discrete aggregates for filter actions or data blending.


              A third option would be to perform the bitmap computation in the SQL, leading to a full-on dimension for use in Tableau. Were I to set this up again, and if the range of values were relatively fixed, I think that's probably how I'd do it.


              Does this help?



              1 of 1 people found this helpful
              • 4. Re: Venn alternative using custom SQL

                Hi Jonathan


                Thanks for your response.  I would at some point like to explore option 3 but I'm on the beginning of my SQL learning journey so what you're suggesting is rather daunting at the moment.  As my mission was accomplished using the CASE method, I shall shelve it until I feel capable of tackling it!


                Many thanks.