You'll need quotes around mvoutlier.
Jim, thank you for your effort and time, however I have tried your solution and it did not work for me. It gave me the same error, furthermore in the example they do it without the quotes around mvoutlier (which I have managed to succesfully replicate). So I am guessing this is not the solution?
If you managed to get it to work like that, could you apply that to my .twbx and upload it please?
Sorry, you're right, you shouldn't need to quote the library name.
You're Tableau workbook works on my PC without any changes -- I'm using Tableau 8.1.3 and R 3.02.
I have multiple versions of R and--I'm not positive, but maybe there was some funkiness with R 2.15, where I got an error and then reran it with quotes and it worked.
You might also try testing the commands in directly in R. Select all of the data points in Tableau > View Data. Cut-and-paste to Excel. Then copy the the avg. temp values (without the header).
In R type x <- scan() and paste the values and press enter. Now you should be able to run sign2(cbind(x))$wfinal01
EDIT: I rebooted and tried everything again using your original workbook --- no problems with either R3.02 and R2.15.3
1 of 1 people found this helpful
Thank you so much Jim!
I tried your suggestion in R directly, if I copy all the temperatures (2804 inputs) it will return me the "Error in svd(xs) : infinite or missing values in 'x' " error, however I copied 26 values from this same dataset and created Y=scan() and copied the values creating value y. And tried the same sign2(cbind(y))$wfinal01 but for y this time, and this time it worked. It returned 26 times "1".
This is making me believe the problem is in R and in particular my R installation, as you don't same to have a problem with the same dataset. It seems my R can't cope with the large dataset. I am using R (64-bit) version 3.0.2 (2013-09-25) and Tableau 8.1.3.
Would you happen to know how I could resolve his? Should I at this point just anything R related, and try to reinstal it?
EDIT: I just removed R and Rstudio and reinstalled R (same version), reinstalled all my packages. And now it works! Jim Wahl thank you so much for your help!
OK. Now I can replicate your error.
In the workbook you attached above, you were including only the month of January at the YEAR() level. This works.
Removing the filter also works. In both cases there are 234 marks / years of data.
Filtering to include only February or any other month generates the error. It's not obvious to me what is causing the error, but I'm not at all familiar with the sign2.
It's also a bit strange---and I think unrelated---that January is the only month with data before 1900 (?) and the average temps see to be significantly higher (??).
It sounds like a bug in mvoutlier. I have seen people complain about that error before using it through R, even without Tableau. Another option for detecting outliers is the pracma package which has a findpeaks function and a Hampel filter.
It may or may not work for your use case because of the method it uses to identify outliers. You should take a look at the documentation to get a sense of how they’re looking for outliers to get a sense of why something is classified as outlier or not.
For example Hampel filter looks at the time series at a fixed sized window at a time so ordering matters. So let’s say your sales are gradually increasing to a huge number, huge number may not be classified as an outlier. It looks at the values at a given point's vicinity not versus the entirety of the dataset.
Here are some examples to try.
IF SCRIPT_REAL("library(pracma); a <- rep(1, length(.arg1)); a[findpeaks(.arg1,threshold=quantile(.arg1,.95),sortstr=FALSE)[,2]]=0;a;", avg([FlowCFS])) == 0
IF SCRIPT_REAL("set.seed(8421);library(pracma); a <- rep(1, length(.arg1)); a[hampel(.arg1,k=3, t0=3)$ind]=0;a;", avg([FlowCFS])) == 0
The error is because the mvoutlier package (the batch of R code that supports the mvoutlier functions) hasn't been installed on your instance of Rserve instance. Here's one tutorial on how to download and install R packages: Quick-R: R Packages
I'm not sure, I've had very limited experience with R installs. Have you tried running your R code on some sample data inside the R console, and is that code finding the mvoutlier library?
Appreciate reading up on your posts. I'm trying to identify outliers within a scatterplot using two student-level variables (time spent in course and exam score). When using your script:
IF SCRIPT_REAL("library(mvoutlier);sign2(cbind(.arg1))$wfinal01", AVG([Flow CFS])) == 0 THEN "Outlier" ELSE "OK" END
I get the following error message: "Error in sign2(cbind(.arg1, .arg2)) : More than 50% equal values in one or more variables!"
It appears that if you use a categorical variable instead of student-level variable, the script works fine (e.g., switch out exam score for instructor name.
Any suggestions on how to identify outliers using two student-level variables?