8 Replies Latest reply on May 6, 2014 1:33 PM by Caleb Smith

    Creating a calculated set from blended data

    Caleb Smith

      I'm attempting to chart institutional performance in clinical trial enrollment. 'Performance' here is just the percentage of active clinical trials that have met their enrollment targets. To get this data I have to use two distinct data sources: Source 1 gives me some basic information about the trial (ID, predicted enrollment, approval date, etc.) and Source 2 gives me the number of subjects enrolled for each ID.


      To find the percentage of active clinical trials that have met their enrollment targets I have to make a calculation that uses [Predicted Enrollment] from Source 1 and [Current Enrollment] from Source 2 to determine whether or not a given study has met or exceeded it's target/prediction. If it has met its target then I give it a 1, if it hasn't then I give it a 0. The equation looks like this:


      IF [Enrollment Data].[Count(Enrollment)] >= ATTR([Predicted Enrollment])

      THEN 1

      ELSE 0



      I then make a static set of everything that gets a 1 and use that in a table calc to get the percent of whole, which gives me the performance measure I'm looking for.


      This all works just fine. Now however, I need to generate the set with a formula rather than doing it by hand, because the data changes every night and I can't manually update the set every morning. It needs to happen automatically. The problem: either because the calculation above is using data from two different data sources, or because the values it's using are aggregates, Tableau won't let me use the calculated field in my set formula. I need help to figure out a way to get around this. I'm completely stuck right now.


      I'm attaching a sample workbook that has everything I've talked about above, including the static set I'm using right now. Can anyone figure out how to make a calculated set where membership is determined by something like the formula above?

        • 1. Re: Creating a calculated set from blended data
          Caleb Smith

          Jonathan Drummey you've done a lot of cool stuff with sets on drawingwithnumbers...have you ever run into something like this (and figured out a way around it)?

          • 2. Re: Creating a calculated set from blended data
            Jonathan Drummey

            Hi Caleb,


            The reason why your calculation won't work to create a Set is that it's depending on a data blend, however computed Sets can only be created from a single data source. On a quick look, I can think of two solutions:


            1) Instead of using a Tableau data blend, do the work necessary to join the two data sets prior to Tableau. Even if they are completely different data sources, you could use ODBC or a database-specific technology to link tables and/or worksheets to have a federated data source. From that, you could build either a Multiple Tables connection or Custom SQL, or a custom view for Tableau. Then in a single data source you could use a computed Tableau Set via a conditional formula or possibly even a row-level calculation to create a dimension. I don't know your data sources or the level of granularity of your raw data, so I can't give you more specifics here.


            2) The calculation requires a computation at the level of ID to compare the enrollment to the prediction. In the data blend case, that means the computation has to occur as an aggregate calculation, and the ID would have to be in the view, and then to do aggregation over the IDs requires table calculations. I can come up with a table calculation solution for that, however it won't necessarily be pretty and could run into performance issues if your # of IDs is getting into the millions or higher.


            Though solution #1 is more work outside of Tableau, it makes life inside Tableau much easier so personally I'd try that route first. If you'd like me to help with #2, I'll have some time over the weekend to do so.




            PS: Please use my other Jonathan Drummey account when pinging me, the Jonathan Drummey SMMC account is the one I use for tech support emails.

            1 of 1 people found this helpful
            • 3. Re: Creating a calculated set from blended data
              Shawn Wallwork

              Hi folks. When pinging Jonathan note:


              (at)jonathandrummey (the preferred email)

              (at)jonathan.drummey (don't use)


              So no dot between the names.


              Jonathan, the SMMC doesn't show up until after we click add reply, so it's not easy to know which one Jive has offered up. Tracy Rodgers or Patrick Van Der Hyde might be able to delete that extra unwanted address.





              • 4. Re: Creating a calculated set from blended data
                Jonathan Drummey

                Ok, that explains why I've just started getting those SMMC emails, thanks, Shawn!


                Jive used to show names when using the @trick, now it just shows these usernames (that we don't really see anywhere else). Maybe there's a configuration somewhere that could be set? Tracy Rodgers?

                • 5. Re: Creating a calculated set from blended data
                  Tracy Rodgers

                  I'm looking into to it as we speak (or write? )!

                  • 6. Re: Creating a calculated set from blended data
                    Caleb Smith

                    Hi Jonathan,


                    Thanks for the response! Unfortunately, of the two solutions I think #2 is the one I'll have to use. Fortunately, I don't expect to ever deal with more than a maximum of 40k-50k rows of data. Definitely not in the millions, so performance may not be an issue.


                    The reason I don't think #1 will work is because Source 1 is an Oracle DB (from a data warehouse) and Source 2 is a CSV flat file that's being overwritten every night. It would be great if both sources were available in the data warehouse...that'd definitely make life easier...but as it is I don't think I'll be able to federate them. That leaves #2.


                    I'm not sure I fully understand the approach from your description but I'll spend the next day trying to figure it out. Any help you can provide with the calcs would be much appreciated!


                    Thanks again!


                    • 7. Re: Creating a calculated set from blended data
                      Caleb Smith

                      Well I haven't yet made any progress with the calc method. I'm going to play around with creating a federated source with Access though. I think if I can set up a job to import the CSV to one table in the Access DB and use the Oracle data in another table, then I can write a query to join them and just use the joined data in Tableau. I don't have much experience with Access though so this'll be a good learning experience.


                      Jonathan, I'd still appreciate any help you could offer with the calc method in case my Access scheme doesn't work out (and even if it does work out, I'd still like to learn how to do the calc method for future reference!).




                      • 8. Re: Creating a calculated set from blended data
                        Caleb Smith

                        I wanted to follow-up on this post with the solution in case anyone with the same issue happens across this thread in the future.


                        Jonathan originally recommended that I create a federated data source prior to bringing the data into Tableau. The federation process would then do the heavy lifting for me. This is ultimately what I did. I created a federated Database in MS Access by creating a linked table to the Oracle source through ODBC and then creating another linked table to the CSV source. With a fairly query I was able to create a third table blending the two sources. I brought this third table into Tableau.


                        It turned out to be a lot easier than trying to tweak the various calculations and if I had it to do over again I would have taken Jonathan's federated data suggestion first and run with it, rather than spending so much time looking for an "easier" calculation method.