5 Replies Latest reply on Feb 14, 2020 3:48 PM by Zhouyi Zhang

# Data Sampling in Tableau

I have two datasets (Data A and Data B) which I am joining in Tableau. After joining the datasets I need to sample the data for each cond (cond1 and cond2 below) based on following rules and tag sample in a calulcated field named as RandomSample.

For Cond1

if Date > Date 1 and cond1 = Y then randomly sample 40% of the population.

For Cond2

if Date > Date 1 and cond2 = Y then randomly sample 10% of the population.

Now, If you will look at the final dataset in the end under expected result. It has Random sample column. For cond1, B,C,G,H all have Y in Cond1  and date > date 1. So we do a 40% sample of 4 which is 1.6. Round it to 2. So we tag b and g as Y in Random sample column.

For cond2, D and E have Y in cond2 and Date > Date 1. Here we do a 10% sample of 2 which is 1. So we tag e as Y in random sample.

My question is how do I create this sampling calulcated field RandomSample in tableau? I have attached  a test workbook below. I know how to do this in Oracle but I don't have both datasets in Oracle so sampling has to be done in tableau after data has been joined.

Thank you.

 Data A Name Cond1 Cond2 Date a Y 2/13/2020 b Y 2/14/2020 c Y 2/13/2020 d Y 2/16/2020 e Y 2/17/2020 f 2/18/2020 g Y 2/19/2020 h Y 2/20/2020

 Data B Name Date1 a 2/12/2020 b 2/13/2020 c 2/14/2020 d 2/15/2020 e 2/16/2020 f 2/17/2020 g 2/18/2020 h 2/19/2020

Expected Result

 Final Dataset Name Cond1 Cond2 Date Date1 RandomSample a Y 2/11/2020 2/12/2020 b Y 2/14/2020 2/13/2020 c Y 2/13/2020 2/14/2020 d Y 2/16/2020 2/15/2020 e Y 2/17/2020 2/16/2020 Y f 2/18/2020 2/17/2020 g Y 2/19/2020 2/18/2020 Y h Y 2/20/2020 2/19/2020
• ###### 1. Re: Data Sampling in Tableau

Zhouyi Zhang , Shinichiro Murakami  Can you please review my above problem

• ###### 2. Re: Data Sampling in Tableau

Hi, Jatin

Is it something like this? If yes, a sample workbook attached.

Hope this helps

ZZ

• ###### 3. Re: Data Sampling in Tableau

Hello Zhouyi Zhang

Thank you so much for your response. I am not sure whats the use of Rank_Unique on Name. Also for sampling we don't have to consider Name at all. Below is the logic for sampling.

For Cond1

if Date > Date 1 and cond1 = Y then randomly sample 40% of the population. Now If you look at the final table below. B,G,H meets the condition (Date > Date 1 and cond1 = Y). So, we randomly sample 40% out of 3 (b,g,h) which comes out to be 1.2. After rounding, it comes out to be 1. So we tag either of b,g,h, as Y. I have tagged g as Y.

For Cond2

if Date > Date 1 and cond2 = Y then randomly sample 10% of the population. If you look at the final table below. C and E meets the condition (Date > Date 1 and Con2 = Y). So, we randomly sample 10% out of 2 (C, E) which comes out to be 1. So, we tag either of C and E as Y. I have tagged E as Y below.

 Name Cond1 Cond2 Date Date1 Random Sample a Y 2/11/2020 2/12/2020 b Y 2/14/2020 2/13/2020 c Y Y 2/13/2020 2/14/2020 d 2/16/2020 2/15/2020 e Y 2/17/2020 2/16/2020 Y f 2/18/2020 2/17/2020 g Y 2/19/2020 2/18/2020 Y h Y 2/20/2020 2/19/2020
• ###### 4. Re: Data Sampling in Tableau

Can you please review my problem above. To simplify above problem, I have created a category for each conditional group in my data (see below). Now my data looks something like below with each category column.

Now I want to  randomly select 40% for category A (count = 6). 0.4 * 6 = 2.4 so after rounding it, select 2 A's randomly and tag them Y/True.

Similarly I want to randomly select 20 % for category  B (Count 4) so 0.2 *4 = 0.8, rounding it to 1. I have selected one Random B below and tagged it as Y/True.

Similarly I want to randomly select 20 % for category  C (Count 10) so 0.2 * 10= 2. I have selected two Random C below and tagged it as Y/True.

Can you help me to create this logic in Tableau?

 Category RandomSample A Y A A A Y A A B B Y B B C C C Y C Y C C C C C C
• ###### 5. Re: Data Sampling in Tableau

Hi, Jatin

In my sample workbook, I use "ceilling" to round up the decimal point, but if you want 1.2 to be 1, then you can use "Floor" to round down. don't worry about the usage of name, it just the way to get a rank value (e.g. from 1 to N), and then compare this value to the 40% or 10% of the population. Tableau doesn't have radom function, so this is the only way I could think of to flag the rows according to your condtions. it is actually not a radom, as everytime it returns the same row as flaged.

Hope this helps

ZZ