1 Reply Latest reply on Nov 18, 2016 6:38 PM by Bora Beran

    Clustering (via R) in Tableau : Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'

    Samuel Christel

      I've been exploring adding value to some of my dashboards by using higher level statistical methods (clustering, time series, regression, etc...). I've had success with most methods I've tried so far, but I'm hitting a snag with k-means clustering. I encounter the error: "Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'" every time I produce a clustering algorithm and drag it to the Rows pane.

       

      I've scoured the forums and have found a number of posts addressing a similar error message. However, none of the proposed solutions are working for me. Attached is a current workbook (Tableau 10.0) in which I'm trying to implement a clustering algorithm. The algorithm is implemented in the calculated (numeric) field "clusters." This workbook leverages the "Abalone" dataset which can be downloaded at UCI Machine Learning Repository: Abalone Data Set or in the "PivotalR" package in R - on CRAN. For those interested: metadata is available from the UCI repository and R.

       

      Any input on troubleshooting this error would be greatly appreciated.

        • 1. Re: Clustering (via R) in Tableau : Error in sample.int(m, k) : cannot take a sample larger than the population when 'replace = FALSE'
          Bora Beran

          Hi Samuel,

          Tableau sends as many data points as there are in the view to R. You only had the calculation itself, so it makes a 1 row table which meant sending 1 rows of data and asking for 5 clusters hence the error message. Below is a screenshot of the working version. I added ID to the view which gives me multiple points (I am assuming 1 for each species in this case) then set the table calc setting to Compute Using > Id.

           

          You also need to update your code to return mod$cluster instead of mod$clusters.

          Screen Shot 2016-11-18 at 6.31.35 PM.png

           

           

          Tableau sends a separate request to R per partition of data. So if you'd like to send all your data at once, you have to put all your dimensions as addressing and nothing in partitioning.

           

          In pre-10, to do this you need to go to Advanced... settings for compute using in Table Calc dialog. Post-10 this means checking boxes next to all dimensions in Table calculation dialog.

           

          I hope this helps.

           

          Also any reason why you're doing k-meas using R. Did you try the built-in clustering?

           

          ~ Bora