3 Replies Latest reply on Apr 4, 2017 8:18 AM by P R

    Identify and Tag rows based on Top N% of dataset

    P R

      Hi all,

       

      I have an example data-set of students with following columns: Name, Subject, Attendance and marks.

       

      I need to tag all students into two types (Regular or Irregular) based on attendance. So for any subject, the top 80% (based on running total) falls in Regular and the bottom 20% fall in Irregular. I would then need two seperate graphs

      1. Total Attendance based on type

      2. Total Marks based on type

       

      Any ideas how I can approach the solution ?

        • 1. Re: Identify and Tag rows based on Top N% of dataset
          Madhura Dighe

          Hi PR,

          See if this is what you want;

          Attendance.png

           

          Let me know if this helps

           

          Cheers! MD

          • 2. Re: Identify and Tag rows based on Top N% of dataset
            Akash mishra

            Hi PR,

             

            You can try this table calculation for marks - RANK_PERCENTILE(SUM([Marks])/TOTAL(SUM([Marks])),'asc')

            and for attendance - RANK_PERCENTILE(SUM([Attendance])/TOTAL(SUM([Attendance])),'asc')

            this will give the percentile based on the total marks and attendence of the of the students

            then create a calculated field using the same and drop it to color shelf to see top 80% students.

            it wil be like this -

            IF RANK_PERCENTILE(SUM([Attendance])/TOTAL(SUM([Attendance])),'asc') >= 0.8 THEN 1

            ELSE 2

            END

            by using this u can create two buckets first for students above 80% and another with less than 80%

             

            i ve created this sample data set for ur reference.

            Name,Subject,Attendance,marks

            a,english,70,60

            b,english,60,56

            c,english,50,46

            d,english,54,73

            e,english,62,74

            f,english,40,7

            g,english,76,90

            e,maths,70,60

            a,maths,60,56

            d,maths,50,46

            b,maths,40,71

            c,maths,76,90

            f,maths,45,60

            g,maths,78,56

            b,science,70,60

            c,science,60,56

            a,science,50,46

            e,science,40,71

            d,science,76,90

            f,science,44,54

            g,science,89,73

             

            1 of 1 people found this helpful
            • 3. Re: Identify and Tag rows based on Top N% of dataset
              P R

              Thank you Akash and Madhura, appreciate the help but the level of detail should be at a higher level not lowest level (student)

               

              Sorry I should explained better. Although your solutions will work when I drop the name on columns or row shelf, what I am looking for is just two bars at overall level.  'X' number of  students fell into red bucket and 'Y' number of members fell into green bucket. I am not looking to break it at an individual level because this is just an example dataset and my actual data contains more than hundred thousand entries and I want to look at highest level and then break it into subjects. But just two bars. Hope this makes it clear.

               

              Also , I want to look at running total percentage because percentile is not working. Something like this

              if RUNNING_SUM(SUM([Attendance])) / TOTAL(SUM([Attendance]))

              <= 0.80 THEN 'Green' else 'Red' END

               

              I might need to use LOD and fix it at Student level but I am unable to combine table and LOD expressions.