5 Replies Latest reply on Feb 14, 2020 3:48 PM by Zhouyi Zhang

    Data Sampling in Tableau

    Jatin Garg

      I have two datasets (Data A and Data B) which I am joining in Tableau. After joining the datasets I need to sample the data for each cond (cond1 and cond2 below) based on following rules and tag sample in a calulcated field named as RandomSample.

       

      For Cond1

      if Date > Date 1 and cond1 = Y then randomly sample 40% of the population.

      For Cond2

      if Date > Date 1 and cond2 = Y then randomly sample 10% of the population.

       

      Now, If you will look at the final dataset in the end under expected result. It has Random sample column. For cond1, B,C,G,H all have Y in Cond1  and date > date 1. So we do a 40% sample of 4 which is 1.6. Round it to 2. So we tag b and g as Y in Random sample column.

      For cond2, D and E have Y in cond2 and Date > Date 1. Here we do a 10% sample of 2 which is 1. So we tag e as Y in random sample.

      My question is how do I create this sampling calulcated field RandomSample in tableau? I have attached  a test workbook below. I know how to do this in Oracle but I don't have both datasets in Oracle so sampling has to be done in tableau after data has been joined.

      Thank you.

       

      Data A
      NameCond1Cond2Date
      aY2/13/2020
      bY2/14/2020
      cY2/13/2020
      dY2/16/2020
      eY2/17/2020
      f2/18/2020
      gY2/19/2020
      hY2/20/2020

       


                                         

      Data B  
      NameDate1
      a2/12/2020
      b2/13/2020
      c2/14/2020
      d2/15/2020
      e2/16/2020
      f2/17/2020
      g2/18/2020
      h2/19/2020

       

      Expected Result

                                                                                                                                    

      Final Dataset
      NameCond1Cond2DateDate1RandomSample
      aY2/11/20202/12/2020
      bY2/14/20202/13/2020
      cY2/13/20202/14/2020
      dY2/16/20202/15/2020
      eY2/17/2020

      2/16/2020

      Y
      f2/18/20202/17/2020
      gY2/19/20202/18/2020Y
      hY2/20/20202/19/2020
        • 1. Re: Data Sampling in Tableau
          Jatin Garg

          Zhouyi Zhang , Shinichiro Murakami  Can you please review my above problem

          • 2. Re: Data Sampling in Tableau
            Zhouyi Zhang

            Hi, Jatin

             

            Is it something like this?

             

             

            If yes, a sample workbook attached.

            Hope this helps

             

            ZZ

            • 3. Re: Data Sampling in Tableau
              Jatin Garg

              Hello Zhouyi Zhang

               

              Thank you so much for your response. I am not sure whats the use of Rank_Unique on Name. Also for sampling we don't have to consider Name at all. Below is the logic for sampling.

               

              For Cond1

              if Date > Date 1 and cond1 = Y then randomly sample 40% of the population. Now If you look at the final table below. B,G,H meets the condition (Date > Date 1 and cond1 = Y). So, we randomly sample 40% out of 3 (b,g,h) which comes out to be 1.2. After rounding, it comes out to be 1. So we tag either of b,g,h, as Y. I have tagged g as Y.

              For Cond2

              if Date > Date 1 and cond2 = Y then randomly sample 10% of the population. If you look at the final table below. C and E meets the condition (Date > Date 1 and Con2 = Y). So, we randomly sample 10% out of 2 (C, E) which comes out to be 1. So, we tag either of C and E as Y. I have tagged E as Y below.

               

               

                 

              NameCond1Cond2DateDate1Random Sample
              aY2/11/2020 2/12/2020
              bY 2/14/2020 2/13/2020
              cYY2/13/2020 2/14/2020
              d 2/16/20202/15/2020
              eY 2/17/20202/16/2020 Y
              f 2/18/20202/17/2020
              gY 2/19/2020 2/18/2020Y
              hY 2/20/2020 2/19/2020
              • 4. Re: Data Sampling in Tableau
                Jatin Garg

                Hello Joshua Milligan. Zhouyi Zhang

                 

                Can you please review my problem above. To simplify above problem, I have created a category for each conditional group in my data (see below). Now my data looks something like below with each category column.

                Now I want to  randomly select 40% for category A (count = 6). 0.4 * 6 = 2.4 so after rounding it, select 2 A's randomly and tag them Y/True.

                 

                Similarly I want to randomly select 20 % for category  B (Count 4) so 0.2 *4 = 0.8, rounding it to 1. I have selected one Random B below and tagged it as Y/True.

                 

                Similarly I want to randomly select 20 % for category  C (Count 10) so 0.2 * 10= 2. I have selected two Random C below and tagged it as Y/True.

                 

                Can you help me to create this logic in Tableau?

                 

                  

                CategoryRandomSample
                AY
                A
                A
                AY
                A
                A
                B
                BY
                B
                B
                C
                C
                CY
                CY
                C
                C
                C
                C
                C
                C
                • 5. Re: Data Sampling in Tableau
                  Zhouyi Zhang

                  Hi, Jatin

                   

                  In my sample workbook, I use "ceilling" to round up the decimal point, but if you want 1.2 to be 1, then you can use "Floor" to round down.

                   

                  don't worry about the usage of name, it just the way to get a rank value (e.g. from 1 to N), and then compare this value to the 40% or 10% of the population. Tableau doesn't have radom function, so this is the only way I could think of to flag the rows according to your condtions. it is actually not a radom, as everytime it returns the same row as flaged.

                   

                  Hope this helps

                   

                  ZZ