8 Replies Latest reply on Oct 18, 2013 8:13 AM by Caleb Smith

    "IF" statement returning too many results

    Caleb Smith

      I'm trying to create a calculated field that returns a count/sum of the rows that meet its criteria. The reason I'm doing this with a calculated field rather than just using filters is because I need to use slightly different logic for each of 4 tables, later summing the results together.

       

      Anyway, I started off with the following calculation:

      SUM(IIF([STATUS] = "Staff Contingency Review"

      OR [STATUS] = "Staff Review"

      OR [STATUS] = "Reviewer Contingency Review"

      OR [STATUS] = "Committee Decision"

      OR [STATUS] = "Review Decision",

      IF DATEDIFF('day',[RECEIPT_DT],TODAY()) > 7 AND DATEDIFF('day',[RECEIPT_DT],TODAY()) < 15 THEN 1 ELSE 0 END,0))

       

      This returned 3,013 results. Now, If I just dropped the row IDs for this table directly onto the row shelf, I get around 127 rows total, nowhere near 3k. So I figured there is something wrong with my calculation. To troubleshoot it, I figured I'd build it up one layer at a time until I found the problem. So I did the following:

       

      SUM(IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END)

       

      This gave me 12 results. There should only be 2 results. If I just do this with a filter, then I get the correct 2 results. I decided to graph the row ID and the calculation value so I could see which rows were being summed in order to figure out if there was some other value I should be considering. When I did this it looked like some IDs were being counted multiple times (some only one time, some 3 or 4). So then I wondered if it was the SUM that was messing things up.

       

      COUNTD(IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END)

       

      This gave me 2 results. Bingo, I thought. Let's add another layer:

       

      COUNTD(IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END)+

      COUNTD(IF [STATUS] = "Staff Review" THEN 1 ELSE 0 END)


      This gave me 4 result. The correct number should be 7. OK, so COUNTD didn't work, let's take the aggregation out of things and just use IF statements, then making the pill a SUM pill rather than an AGG.


      IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END+

      [STATUS] = "Staff Review" THEN 1 ELSE 0 END


      This gave me 20-something results. The correct number should be 7. Again, some rows were being counted multiple times.


      Now I've got myself tied in knots. I have no idea what I'm missing here. Why is the basic "SUM" giving me so many results? While if I simply drop the row ID onto the row shelf and then filter by status I get the correct numbers. I can't figure it out. Help!

        • 1. Re: "IF" statement returning too many results
          Dan Huff

          COUNTD(IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END)+

          COUNTD(IF [STATUS] = "Staff Review" THEN 1 ELSE 0 END)


          This one makes sense to return 4 since we will get a 1 and a 0 for both IF statements thus giving us 2 + 2 = 4.


          Your other calculation, however, is a bit more interesting . Is there any chance you can post a workbook with the issue? Given the items like "Staff Review" I suspect this is not possible.


          From what you have posted above, I have made the following assumptions. Please correct any that are wrong:


          • You have joined multiple tables together in your data connection
          • The STATUS and RECEIPT_DT fields are from different data sources
          • When you added Row ID to the view, the calculation was not present anymore

           

          If my assumptions are correct, it may be that you have a join that is causing duplicate records.

           

          Dan

          1 of 1 people found this helpful
          • 2. Re: "IF" statement returning too many results
            Caleb Smith

            Thanks, Dan.

             

            Regarding your assumptions:

             

            1. Yes, multiple tables are joined via LEFT join.

            2. Both STATUS and RECEIPT_DT are from the same table in the same data source.

            3. Not sure what you mean here.

             

            In order to help clarify, I'm attaching two screenshots. The first shows The results of the simple IF/THEN statement. The second one .shows the results if I just drop the dimensions onto the row shelf and then filter them (this is the correct number of results). The third shows what happens when I drop the dimensions onto the row shelf, filter by status, and include the results of my simple calculation (each of the two records is counted 6 times in this example, rather than just being counted once).

             

            Results of Calculation.PNG.png

            Results of Filtering.PNG.png

            calculation by ID.PNG.png

            • 3. Re: "IF" statement returning too many results
              Matt Lutton

              Caleb:

               

              After creating joins in Tableau, its always a good idea to pull out data into a crosstab view or preview the results of the join and see if the data is being duplicated.  This often happens when fields have a one to many relationship. If that's the case, you know there's an issue.

               

              Here's some more info on how to deal with it, although I admittedly didn't read through your entire thread so I'm not sure what's happening exactly: Removing Duplicated Data after Joining Tables | Tableau Software

               

              I like using data blending, instead of joins, in many of these scenarios.

              • 4. Re: "IF" statement returning too many results
                Dan Huff

                Put Number of Records in the second screenshot. I bet you will get 6 for each HUM_RSR_SRC_ID

                1 of 1 people found this helpful
                • 5. Re: "IF" statement returning too many results
                  Caleb Smith

                  Son of a gun. You're right. I'm getting 6 records for each ID. So the joins are definitely resulting in duplicate data. Any idea of how to remedy this? I looked at Matthew's link above but I don't think simply calculating the AVG will work in this case -- it can tell me the AVG for each ID (1) but it doesn't tell me the the sum of all the distinct IDs, which is what I'm going for.

                  • 6. Re: "IF" statement returning too many results
                    Matt Lutton

                    You can probably get the results using data blending.  If you post your workbook with the data sources connected as separate sources, we can take a look at getting the solution you want.  Again, I haven't studied your thread, but typically, data blending can eliminate this issue.

                    • 7. Re: Re: "IF" statement returning too many results
                      Matt Lutton

                      Here's a simple example of workbook using data blending to fix duplicate records (attached)

                      • 8. Re: "IF" statement returning too many results
                        Caleb Smith

                        Ah! I think I got it. The following gives me what I'm looking for:

                         

                        COUNTD(IF [HUM_RSR_SCR_STATUS] = "Validate Committee Decision" THEN [HUM_RSR_SCR_ID] END)

                         

                        I can then expand that paradigm to the original problem like this:

                         

                        COUNTD(IIF([HUM_RSR_SCR_STATUS] = "Core Committee Staff Contingency Review"

                        OR [HUM_RSR_SCR_STATUS] = "Core Committee Staff Review"

                        OR [HUM_RSR_SCR_STATUS] = "Validate Committee Decision"

                        OR [HUM_RSR_SCR_STATUS] = "Validate Expedited Decision",

                        [HUM_RSR_SCR_ID],NULL))

                         

                        I've been too hung up on using SUM. Even when I tried COUNTD before, I was still treating it like a SUM. For instance, of course COUNTD(IF [STATUS] = "Staff Contingency Review" THEN 1 ELSE 0 END) was going to give me bad results because I was assigning the row the number 1 if it met my criterion, and 0 if it didn't so for each COUNTD I would get 2 results (0 and 1).