3 Replies Latest reply on Feb 13, 2018 5:41 AM by Yuriy Fal

    Output of R script: aggregate and use as dimension and/or measure

    Ele Begé

      Hello,

       

      I am using the functions "SCRIPT_REAL" and "SCRIPT_STR" to train a random forest using R, and predict some values (binary classification problem).

      I would like to use the output of my script to perform other calculations, for which I need to aggregate my data. However, when I create a calculated field and aggregate the data for example using AVG(), it states the data is already aggregated.

      Apart form this, my measures are disaggregated in the view, and if I aggregate measures (Analysis>Aggregate measures>(tick)), the "SCRIPT_" would throw an error: Error in randomForest.default(m, y, ...) : Need at least two classes to do classification.

       

      Is there any way I could use R scripts outputs normally as measures or dimensions? Or is there any way to avoid having such problems with aggregations and implement, for example, LOD expressions on my output?

       

      I have attached an example Tableau workbook created under Tableau 10.5.

      In the calculated field (measure) "CountUsersbyHobby" I tried to count the number of Subjects that are predicted to use a Mobile App per each Hobby category (out of 4 possible ones). However, I am unable to do so, as the output of R is already said to be aggregated.

      I have two R scripts implementing the same random forest but one provides an output as a boolean ("randforest_bool") and the second one provides a percentage of the probability ("randforest_prob").

       

      Any help or advice would be greatly appreciated. Thank you!

        • 1. Re: Output of R script: aggregate and use as dimension and/or measure
          Yuriy Fal

          Hi Ele,

           

          Not that i'm an expert in R programming

          (ditto in solving classification problems :-)

           

          But i'm confident enough that integrating Tableau

          with external services (R or Python) could be complicated.

           

          The complications are mainly because of the "late integration".

          Tableau treats every SCRIPT_...() function as a Table Calc one.

          This could be a problem -- if there is a need to re-use the result.

          One should be writing another Table Calc referring to this one.

          A "simple" aggregation (binning) would become a nightmare.

           

          Frankly, in your particular case it is possible to "count Subjects" in bins --

          to calculate bin sizes for the [randforest_bool] values (both True and False).

          This could be done via creative usage of RANK() functions (Table Calculations).

           

          Please find the attached as an example.

           

          The Sheet 7 view is mirroring the Sheet 7 bool one

          (the results are the same, but please check the field order

          in the Addressing window for every Table Calc on both,

          different order gives different results of counts in bins).

           

          Hope it could help understanding.

           

          Yours,

          Yuri

          1 of 1 people found this helpful
          • 2. Re: Output of R script: aggregate and use as dimension and/or measure
            Ele Begé

            Hi Yuri,

             

            Thank you for your suggestion! I am new to Tableau and I still need to better understand Table Calculations, but it is usefull to see how you did it. I believe this approach will help me to perform some other calculations with the R-script output that I though would not be possible!

             

            Best,

            Ele