I'm currently trying to use TDAmapper (which is a package from R) from Tableau because all my data is currently loaded onto Tableau server, and I would like to use the algorithm to identify useful groups within my dataset.
This, in principle, should be possible since: (a) one can pass calculations from Tableau to R via Rserve, which I understand the basics of; (b) it is possible to do k-means clustering using the SCRIPT_INT command in Tableau.
Unfortunately, while I'm able to follow the basic applications of TDAmapper in R (see below for reference), I don't quite know how to employ TDAmapper from Tableau. Can somebody tell me how to do this? I tried the following code below and it didn't work:
MAX([Arising Credit Item Cost]),[Sales Discount %],[Sales Margin %]
For convenience, let me briefly discuss the general pieces of this puzzle:
1) How does one pass from Tableau to R?
One has to do this by one of four commands, the most relevant to us being SCRIPT_INT, which is a command we use if we expect a computation to yield us an integer result. The general form of the command is:
Tableau fields being passed in)
For example, if we want to find the correlation between the variables "Profit" and "Discount", then we use the SCRIPT_REAL function (which is when we expect the computation to yield us a real-value/non-integer value), which is written as:
where sum([Discount]) is .arg1, and sum([Profit]) is .arg2 respectively in the R script.
2) How does one do k-means clustering in Tableau by passing through R?
The code is as follows:
## Sets the seed
set.seed( .arg8 )
## Studentizes the variables
age <- ( .arg1 - mean(.arg1) ) / sd(.arg1)
edu <- ( .arg2 - mean(.arg2) ) / sd(.arg2)
gen <- ( .arg3 - mean(.arg3) ) / sd(.arg3)
car <- ( .arg4 - mean(.arg4) ) / sd(.arg4)
chi <- ( .arg5 - mean(.arg5) ) / sd(.arg5)
inc <- ( .arg6 - mean(.arg6) ) / sd(.arg6)
dat <- cbind(age, edu, gen, car, chi, inc)
num <- .arg7
## Creates the clusters
MAX( [Age] ), MAX( [Education ID] ), MAX( [Gender ID] ),
MAX( [Number of Cars] ), MAX( [Number of Children] ), MAX( [Yearly Income] ),
[Number of Clusters], [Seed]
3) How does one use TDAmapper in R?
TDAmapper is an algorithm in R that gives us a specific way of identifying similar members of the dataset. The prototypical example of this would be applying TDAmapper to identifying similar types of diabetic patients based on the dataset (made available as "chemdiab")
The general idea is that we have a dataset of points, and we define a particular function (known as the "filter function") to assign a value to these points. Once we have done this, we cover these datapoints with a finite number of intervals - furthermore, for the algorithm we also need to specify: (a) the number of intervals we use; (b) the percentage overlap between these intervals.
filter.kde<-kde(normdiab[,1:5],H=diag(1,nrow = 5),eval.points =normdiab[,1:5])$estimate
## filter.kde is defined to be our filter function
## In this case, we assign values to the data points based on the kernel density.
dist_object = normdiab.dist, filter_values = filter.kde,
## Here, the mapper() algorithm accepts as input the distance matrix of the data points we want, the filter function, the number of intervals, the percentage overlap.
## We also have another parameter (which affects the clustering algorithm that is implicitly used in the TDAmapper algorithm), which can be any integer value we like.