7 Replies Latest reply on Dec 20, 2016 1:53 PM by Joe Oppelt

    How to extract two columns of measures from complex database for scatter plot?

    Peter Hasselbacher

      I want to make a scatterplot to see if there is correlation between the numbers of prescriptions written for two different drugs by many prescribers.  Specifically, for Medicare beneficiaries in Kentucky, do prescribers who write many prescriptions for hydrocodone also write many prescriptions for gabapentin.  I wish to plot these two measures against each other and create a regression line and statistics.

       

      The single database is large, >500,000 rows.  Each individual prescriber has a unique identifier number.  There is a separate line in the Excel table for every single drug prescribed by an individual prescriber including the number of times that drug is prescribed.  Thus, a single prescriber number will have as many rows of data as the number of different drugs prescribed.  There are several thousand different drugs but few physicians prescribe more than several dozen different ones.

       

      My difficulty is extracting two columns of measures for the two drugs I wish to plot against each other. Specifically, in the case above, how do I extract the number of prescriptions for both hydrocodone and gabapentin for each prescriber in a structure such that I can plot a scatter graph? I have fiddled with filters, sets, and calculated fields to little end.  I am a relative newcomer to Tableau.  An example of the data I am working with can be seen in this public Tableau visualization.

      https://public.tableau.com/views/GabapentinPrescribinginKYMedicare2014/OtherOpioids-Select?:embed=y&:display_count=yes

       

       

      Below is an example of the structure of the relevant database (Limited & greatly simplified data.  For data structure only). There are 5 different prescribers, each prescribing both the target drugs as well as others.  How do I separate the wheat from the chaff?

      Peter

       

       

       

      Prescriber ID

      GENERIC NAME

      Number of Prescriptions

      1003000902

      SITAGLIPTIN PHOS/METFORMIN HCL

      14

      1003000902

      NAPROXEN

      24

      1003000902

      CELECOXIB

      11

      1003000902

      GABAPENTIN

      18

      1003000902

      EZETIMIBE

      16

      1003000902

      HYDROCODONE

      26

      1003000902

      BUPROPION HCL

      17

      1003000902

      WARFARIN SODIUM

      16

      1003008095

      SULFAMETHOXAZOLE/TRIMETHOPRIM

      12

      1003008095

      ESOMEPRAZOLE MAGNESIUM

      27

      1003008095

      GABAPENTIN

      17

      1003008095

      HYDROCODONE

      54

      1003008095

      FAMOTIDINE

      23

      1003008095

      PANTOPRAZOLE SODIUM

      227

      1003008095

      DULOXETINE HCL

      16

      1003008095

      LEVOTHYROXINE SODIUM

      17

      1003008095

      AZITHROMYCIN

      16

      1003010539

      AZITHROMYCIN

      12

      1003010539

      HYDROCODONE

      65

      1003010539

      GABAPENTIN

      11

      1003010539

      OXYCODONE HCL/ACETAMINOPHEN

      21

      1003010539

      PREDNISONE

      32

      1003010539

      CIPROFLOXACIN HCL

      18

      1003012428

      HYDROCODONE

      51

      1003012428

      GABAPENTIN

      28

      1003012428

      CLOTRIMAZOLE

      11

        • 1. Re: How to extract two columns of measures from complex database for scatter plot?
          Joe Oppelt

          Start off with a calc that captures prescriptions written for hydrocodone

           

          IF [Generic Name] = "hydrocodone" then [Number of prescriptions] END

           

          In  fact, you would be even better to give the user a parameter to select the generic name.  Make the sheet more user-flexible.


          This will give you all the rows where the first prescription was written.

           

          Then do a calc

           

          { FIXED [Prescriber ID] : SUM( [ whatever you called that calc]) }

           

          Now you can use that calc to get the top prescribers for [First Generic Parameter].

           

          DO the same for [Second Generic Parameter].

           

          Now you have two calcs that you can use to compare heavy hitters for whatever two prescriptions the user wants to compare.

           

          (Note:  Assumes Tableau version 9.0 or higher.)

          • 2. Re: How to extract two columns of measures from complex database for scatter plot?
            Peter Hasselbacher

            Thanks for swift reply. I am obviously weak at calculated fields.  I got “Calculation Valid” indications, but attempts to view data returns only “nulls”.  Is there something obvious I am missing or did wrong?  Here are descriptions for one of the pair of drugs.

             

            Paired Calculated fields for one drug.  Returns only nulls.

             

            Gabapentin Rx's

             

            Role:

            Continuous Measure

            Type:

            Calculated Field

            Default aggregation:

            Sum

            Status:

            Valid

            Formula

             

            if[Generic Name]="gabapentin" then END

             

            Domain

             

            The single value Null.

             

             

            and

            Calculation1

             

            Role:

            Continuous Measure

            Type:

            Calculated Field

            Default aggregation:

            Sum

            Status:

            Valid

            Formula

             

            {fixed[NPI]: SUM()}

             

            Domain

             

            The single value Null.

             

            Peter

            • 3. Re: How to extract two columns of measures from complex database for scatter plot?
              Joe Oppelt

              I'll need a sample workbook to do anything further on this.

               

              If proprietary data is an issue, look at the video in this link:

               

              Anonymize your Tableau Package Data for Sharing

               

              We don't need all your data.  Just enough to have some rows where "Prescription 1" is prescribed, some where "Prescription 2" is prescribed, and some where "Prescription 3,4,5" are prescribed.  Maybe 20 rows in all.  Bogus prescription amounts, and bogus prescriber IDs.

              • 4. Re: How to extract two columns of measures from complex database for scatter plot?
                Peter Hasselbacher

                Joe.

                You are kind to help. Attached is a packaged workbook showing just what I wanted!  I overlooked that the IF function =“Generic Name” is case sensitive. (First 400 lines of paired data by prescriber.  Not every prescriber prescribed both drugs. The data are in the public domain.)

                 

                Can you comment briefly what the {curly braces} do to the calculation?

                Thanks so much.  This gets me started in the right direction.

                Peter

                • 5. Re: How to extract two columns of measures from complex database for scatter plot?
                  Joe Oppelt

                  (Note to self:  Version 10.0 workbook)

                   

                  In the attached I added a data source filter to cut down the number of NPI so that we can better visualize what's going on behind the scenes.

                   

                  In Sheet1 I just dumped out all the measures so I could see what is going on.  The first thing I noticed is that the two "prescriber" calcs set the value for all rows of a given NPI.  I only want to see one value per NPI.  (I could even see that better on sheet 2 when I displayed the measures for each NPI.)  So I changed those two calcs to add in the [Generic Name] to the calc.

                   

                  The { } denote LOD calcs.  These were added in V9.0.  It helps us do a lot of what Table Calcs do, but for some people it works more intuitively.

                   

                  In the case for this example the FIXED LOD say this, for each combo of [NPI] and [Generic Name], do the logic after the colon.  And do this as the data is brought in from the data source, before any filters on the sheet are activated.  What we're doing in there is capturing the claims count for each NPI that has a gaba claim.  You can see how that is working on Sheet 2.  (For all other rows in the data, the value of that calc is null.)  Ditto for the Hydro calc.

                   

                  I actually could have combined the pair of calcs for Gaba into one:

                   

                  {FIXED[NPI], [Generic Name] :SUM( IF [Generic Name]="GABAPENTIN" THEN[Total Claim Count]END )}


                  Having two calcs lets you see the logical progression of what's going on better.

                   

                  (Note also that I created two parameters.  I populated it with the values in [Generic Name].  Instead of hard-coding the value in the calc, you could insert the parameter there.  then you could let the user select the two prescription names, and compare any two prescriptions like this!)

                   

                  On Sheet 4 I created a table calc to rank the Gaba prescribers.  Largest to smallest.  I put that calc on the rows shelf, changed it from "continuous" to "Discrete" and moved it to the front of the list on "Rows".  This is how you sort a table calc.  (Tableau isn't so flexible on sorting table calcs, but this is the method.)  Notice that it put all the NULLS at the top.  So I made a copy of this sheet to sheet 5.

                   

                  I added the table calc to the filters shelf.  I selected for NULL, and chose EXCLUDE.  Now the NULLS don't display.  I also added the Hydro Rx field to the sheet.  Now you can see how the top Gaga guys also prescribed Hydro.  (Just to show something you can do with this.)


                  On sheet 6 I right clicked on [Top Gaba] and unchecked "show header".  Now the rank value doesn't show, but your sorting remains in place.

                   

                  This is just some stuff you can do with the idea I suggested.

                  1 of 1 people found this helpful
                  • 6. Re: How to extract two columns of measures from complex database for scatter plot?
                    Peter Hasselbacher

                    Thank you so much Joe!  I will plough through this later today. I had hoped I would be able to ignore level of detail issues for a long time, but I see not so!

                    Peter

                    • 7. Re: How to extract two columns of measures from complex database for scatter plot?
                      Joe Oppelt

                      Either LOD or Table Calcs -- you can get that kind of stuff either way.  Each method has its complications.  Each has its charms.

                       

                      Don't hesitate to ask questions in this thread.  I'll get a ping in email if you do.