3 Replies Latest reply on Apr 4, 2017 8:18 AM by P R

# Identify and Tag rows based on Top N% of dataset

Hi all,

I have an example data-set of students with following columns: Name, Subject, Attendance and marks.

I need to tag all students into two types (Regular or Irregular) based on attendance. So for any subject, the top 80% (based on running total) falls in Regular and the bottom 20% fall in Irregular. I would then need two seperate graphs

1. Total Attendance based on type

2. Total Marks based on type

Any ideas how I can approach the solution ?

• ###### 2. Re: Identify and Tag rows based on Top N% of dataset

Hi PR,

You can try this table calculation for marks - RANK_PERCENTILE(SUM([Marks])/TOTAL(SUM([Marks])),'asc')

and for attendance - RANK_PERCENTILE(SUM([Attendance])/TOTAL(SUM([Attendance])),'asc')

this will give the percentile based on the total marks and attendence of the of the students

then create a calculated field using the same and drop it to color shelf to see top 80% students.

it wil be like this -

IF RANK_PERCENTILE(SUM([Attendance])/TOTAL(SUM([Attendance])),'asc') >= 0.8 THEN 1

ELSE 2

END

by using this u can create two buckets first for students above 80% and another with less than 80%

i ve created this sample data set for ur reference.

Name,Subject,Attendance,marks

a,english,70,60

b,english,60,56

c,english,50,46

d,english,54,73

e,english,62,74

f,english,40,7

g,english,76,90

e,maths,70,60

a,maths,60,56

d,maths,50,46

b,maths,40,71

c,maths,76,90

f,maths,45,60

g,maths,78,56

b,science,70,60

c,science,60,56

a,science,50,46

e,science,40,71

d,science,76,90

f,science,44,54

g,science,89,73  1 of 1 people found this helpful
• ###### 3. Re: Identify and Tag rows based on Top N% of dataset

Thank you Akash and Madhura, appreciate the help but the level of detail should be at a higher level not lowest level (student)

Sorry I should explained better. Although your solutions will work when I drop the name on columns or row shelf, what I am looking for is just two bars at overall level.  'X' number of  students fell into red bucket and 'Y' number of members fell into green bucket. I am not looking to break it at an individual level because this is just an example dataset and my actual data contains more than hundred thousand entries and I want to look at highest level and then break it into subjects. But just two bars. Hope this makes it clear.

Also , I want to look at running total percentage because percentile is not working. Something like this

if RUNNING_SUM(SUM([Attendance])) / TOTAL(SUM([Attendance]))

<= 0.80 THEN 'Green' else 'Red' END

I might need to use LOD and fix it at Student level but I am unable to combine table and LOD expressions.