13 Replies Latest reply on Mar 25, 2017 4:40 AM by Glenn Kuly

    Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?

    co lam

      Hello,

      I would like to create a control chart for my job assignment.  The data set is number of students who passed/failed/Did not complete in each course. From these raw counts, I can find out the percentage of each group in each course.  The percentage of the passed group is the completion rate. The objective of my job assignment is to create a KPI to identify courses with extremely high/low completion rates.

      1.png

      The problem I have now is I don’t know how to use this table calculation to calculate the upper/lower limits (the threshold of high/low completion rates).  If it’s not a table calculation but a simple agg measure like AVG(GPA), then I can apply the following calculation to find out the upper/lower limits like this one I did:

              2.png

       

      Could someone guide me through how to use the table calculation (completion rate %) to calculate the upper/lower limits?

      I also tried to drag the table calculation (completion rate %) into Measure Values area and turned it into a measure field.  But when I applied the formula / calculation on it, the result is not right at all. 

      I know that the completion rate is based on completion types. So I chose Results are computed along Table (completion types).  But still didn't work, the line is 1, which is totally wrong. 

       

       

      For now, I relied on the Analytics tool / Distribution Band /Average/Standard Deviation.  But I cannot use the SD to create parameter and then a KPI. (To identify courses with high and low completion rate).

        

      3.png

      Since the data contains student info, I can't share the workbook.

       

      Sincerely,

      CO

        • 1. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
          Joe Oppelt

          Post a workbook with sample data.

           

          You'll get the table calc to work if you do the right setting in the table calc settings, but the right setting depends on what is on your sheet.

          • 2. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
            co lam

            Hi Joe,

            As suggested, I have created a fake data and workbook to show my problems.

            Could you please take a look at it and show me my mistakes and the right

            way to do it?

            Thanks.

            On

             

             

            On Fri, Mar 17, 2017 at 2:59 PM, Joe Oppelt <tableaucommunity@tableau.com>

            • 3. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
              Joe Oppelt

              Let's take this one step at a time.

               

              Understand that I am not sure of the math that you are doing in there, so you'll need to guide me through the logic while I guide you through the mechanics.

               

              If I understand correctly, you are thinking of generating an upper bound reference line, and a lower bound reference line.  Do you want one line overall?  Or one line per [Completion type]?  It will change how we manipulate the table calc settings.

               

              Is your [Calculation1] calc achieving correct values the way you uploaded the example?  I am working on the assumption that it is.

               

              I think (but I'm not sure) that you need your standard deviation portion of your calculation to operate in the same way you want Calculation1.  So I separated out that one piece of the [Lower...] and [Upper...] calcs into its own calc. [Calculation1] is running along a specific dimension ([Completion types]) but you need the [... bound] calcs to run along the course names (or, more generically, TABLE(across) on this sheet.)


              Do "Edit Table Calc" on [Lower bound] in the Measure Values panel.  Near the top of the edit dialog box you will see a section for "Nested Calculations".  By default it lists the named calc at the top of the puldown list, but all the table calcs that are involved in this calc are also listed.  You can set one calc with one setting, and another with a different setting.  In this case I have [Lower bound] set as Table(across), but [Calculation1] and [Window StDev as running along the [Completion Types] Dimension.  This results in a separate line for each Completion type.

               

              We still have 30 nulls because there are 6 marks per course, and 5 courses that have only 100% values are not calc-ing lower and upper bounds.  6 x 5 is 30.  I'm pretty sure I'm still not directing the use of one or more of the table calcs appropriately for what should be happening here.  But the first step in what you need to do with all this is to understand that you can tell a nested table calc to operate one way for part of the operation and another way for some interior table calc that is a component of the overall calc.

               

              Another thing we can look at once we get the calc directions straight is the way you are aiming to depict this viz.  What I am guessing is that once you have this worked out, you'll make it dual axis.  We also have the concept of "Reference Lines".  You can add reference lines to an axis (in this case the [Calculation1] axis.  I just want to put that out there.  Dual axis is also fine, and, in fact, I think it gives you more flexibility in formatting and coloring.

               

              Anyway, let's build off of what I have attached here.

              • 4. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                co lam

                Hi Joe,

                Sorry for my late response.     Here’re the logic and objective. Hope it makes sense to you.

                In each course, we will have 3 student groups (i.e. pass, fail, or withdraw from the course) Based on the total number of students in the course, we can find out the % of each group, i.e. the 3 completion types and rates.

                 

                The chart I would like to create will show % of the 3 completion types across different courses within a “course level” i.e. lower level (100 -200 Level).     

                However, users will look at the completion types one at a time.  I will hide the other 2 types in the Color card. As such, the % remains the same.  If I filter the other 2 types, the % will be recalculated and it won’t be correct.

                The objective of my job assignment is to create a pair of thresholds to identify courses have exceptionally high / low successful rate (or fail / withdrawal rate).  To that end, I need 2 boundaries (upper/lower) for each completion type because the boundaries should be different for each type. i.e. 1 pair for the successful students gp, 1 pair for the fail, and the other pair for the withdrawal.  You have created 2 boundaries in the chart so it’s good.

                Based on the particular boundaries, I can create a logical calculation field (above or below the thresholds) as a KPI.

                 

                My [Calculation1] comes from Step 1 sheet [SUM(Number of Records].   

                If you put “Course” in, you will see the % distribution of the 3 Completion types of each course.  These % are correct. 

                The control chart is to convert these % into shapes.  Once I turned the table (Step 1) into chart (Step 2), [SUM(Number of Records] went to the Rows.  I dragged it into Measures and turned it into a Measure field.

                 

                I think each bound should have the same value (std dev) across the courses.  Let’s say there are 7 courses in Level_1.  The % of students passed each course are: 100, 60, 50, 40, 70, 66, 45.  Then the SD will be 18.68.   Mean will 61.57.   Accordingly, the 2 boundaries will be 42.89 and 80.26 respectively.  Hence, the lines should be straight.   Same principle will be applied to the fail type and withdrawal type.   Also, since the std dev is based on the 7 numbers, the lines should be across all 7 courses.  In other words, the lines should not be broken at the courses with 100% and should not have NULL. 

                2017-03-21_16-34-24.png

                 

                I will look into your calculations and try to understand it..    For your helps, it's one step closer now.   thank you very much.

                 

                Cheers,

                On

                • 5. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                  Glenn Kuly

                  Are you looking at each completion type in isolation?

                   

                  The following calculated field will produce the chart you just posted:

                   

                  {Fixed [Course]:(SUM(IF [Completion Types] = "Successful" THEN 1 ELSE 0 END))} / {Fixed [Course]: SUM([Number of Records])}

                   

                  The following result is based on successful completions for the first ten courses on your list. The reference lines are based on 1 std dev, but I believe a parameter could be developed allowing you to display 2 or 3 std dev's, if that's what you want.

                   

                  4.png

                   

                  ** edit: I overcomplicated the calculated field above, a simple "SUM(IF [Completion Types] = "Successful" THEN 1 ELSE 0 END) /SUM([Number of Records])" does the same thing.

                   

                  -- Glenn

                  • 6. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                    Glenn Kuly

                    I've developed the worksheet a little further, with completion type and number of standard deviations selected by parameters.

                     

                    The upper and lower standard deviation reference lines are done separately.

                     

                    8.png

                    -- Glenn

                    • 7. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                      co lam

                      Hi Glenn,

                      I was looking at your solution Worksheet B this morning and going to respond.  But then found out you have kindly created a more sophisticated and better version, Worksheet C.  Thank you so much for your help.  I haven’t looked into Worksheet C yet, but I certainly will do it later today.

                       

                      If you don’t mind, may I ask you a few questions about Worksheet B?

                      • Could you please let me understand why the 2 StdDev boundaries will be calculated and shown on the chart automatically in worksheet 4?  I looked at the Successful Percentage calculation, I didn’t see anything related to StdDev, nor did you create addition field for that.

                       

                      • Could you please explain the “fixed” function?  The help said it’s for a specific dimension.  Still not quite get it.

                       

                      • Is it possible to exclude courses with 0% (courses without record on a particular type, e.g. no student fail) from the calculation? Because the StdDev boundaries will be affected substantially.  For instance, when I changed the “successful” to “Fail” in the “Successful Percentage” field, the lower bound was below -10%.  But once I highlighted and excluded courses with 0%, the 2 bounds changed accordingly, which seemed to be more accurate to me. 

                       

                      In Worksheet B, I have to change the type in the “Successful Percentage” field so that the chart and the boundaries will change accordingly.  I was wondering how to create additional fields for the other 2 (fail and withdrawal).  But then I noticed you had created a parameter for completion type.  So this is no longer a question to me.

                       

                      Thanks so much for your kindly help. 

                       

                      Cheers,

                      On

                      • 8. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                        Glenn Kuly

                        Hi On,

                         

                        1) the standard deviation bands in the previous workbook were set using Analytics -> Distribution Band -> Value: Standard Deviation (-1, 1) -> Formatting: line, no fill.

                         

                        2) The 'Fixed' function is outside the scope of this problem, and there are people on this forum far better equipped than I am to explain it.

                         

                        3) Yes, you can do this, but at the cost of losing the data points in the viz. I've done this on worksheet 8 of the attached workbook, for "Successful" only. So you will see only nine data points, even though 10 courses are selected in the filter (ARCH100, with zero passes, disappears from the view). The recalculated field Percentage Successful is:

                         

                            IF(SUM(IF [Completion Types] = "Successful" THEN 1 ELSE 0 END)
                              /SUM([Number of Records]) > 0)

                         

                            THEN SUM(IF [Completion Types] = "Successful" THEN 1 ELSE 0 END)
                              /SUM([Number of Records])

                         

                            END

                         

                        In other words, if the result is > 0, then include the result, otherwise, not. I'm sure someone can come up with a more elegant solution, especially in terms of handling the null value that results from excluded courses.

                         

                        -- Glenn

                        • 9. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                          co lam

                          Hi Glenn,

                          Sorry to bother you again. 

                           

                          I looked at Sheet 8 in Version D, the result changed dramatically from Version C.  With nine 'successful" data points, the lower limit was -0.26.

                          In Version C, if I pointed & selected ARCH100, then right click to choose "exclude",  the lower limit was 32%.  If I included this course, the lower limit would be 22%.

                           

                          Do you have any idea why the results differ so much when ARCH 100 is excluded in both versions?

                           

                          Cheers,

                          On

                          • 10. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                            Glenn Kuly

                            Hi On,

                             

                            I'm seeing the lower limit as -0.26 when the standard deviations parameter is set to 3, but 0.32 when the parameter is set to 1, which is correct (i double checked in Excel) based for a calculation based on population numbers.

                             

                            I reread your posts above, and i can see that you weren't actually looking for a parameter to alter the number of standard deviations that the reference line can show, but i threw it in just to see how it would look.

                             

                            HTH.

                             

                            -- Glenn

                            • 11. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                              Glenn Kuly

                              Hi On,

                               

                              To clarify a little further, looking at 2 and 3 standard deviations away from the mean is a way of identifying outliers (remember the 68-95-99.7 rule).

                               

                              So, just a thought, but for courses where every student either withdraws or fails, why would you want to remove them from the analysis?

                               

                              -- Glenn

                              • 12. Re: Could we, and how to, use table calculation instead of a simple agg measure to calculate the upper/lower bounds in Control Chart?
                                co lam

                                Hi Glenn,

                                Sorry for my late reply. Tried to replicate all the calculation fields in the real data for the whole day.  Not as smooth as I thought.   Almost there.  Haven't been able to make the 2 boundaries show up in the real data.

                                      

                                Regarding your question, we will look at the 3 completion types one by one in this work assignment.  The first one will be success rates across the courses.  Say there are 100 courses with success rates, we would like to know which courses have the exceptionally high / low success rates within this group.  If a course without any student passed, it will not be included in this group.  So, the 2 boundaries/SD should be based exclusively on these 100 courses that have success rates.   If a course without any success rate, then it will be included in the other group(s), say the fail and/or the withdrawal groups. 

                                 

                                To some extent, it’s like the boxplot.  If we put the 3 types on the x-axis, and the rate on the y-axis.  Each box will have its own Q1 & Q3, the whisker will extend to the max/min.  If a course has fail and withdrawal rates only, it will be included in the fail and withdrawal boxes but not in the success type box.

                                 

                                In short, we are not ignoring the fail and withdrawal courses.  They are important to us.  In fact, those 2 rates will help us to identify which courses have issues and will need further investigation.   : )

                                 

                                Thanks so much for your kindly help.  Greatly appreciated. 

                                 

                                Cheers,

                                On