This Wiki is written for both Tableau users who are new to Python, as well as Python users who are new to Tableau. It provides information on how to install python and integrate it with Tableau and then goes beyond into detailed information on how TabPy works and how you can use it for you own analytics.
Tableau talks to R, Python and MATLAB using a set of SCRIPT_ functions. SCRIPT_*() functions - SCRIPT_BOOL(), SCRIPT_INT(), SCRIPT_REAL(), and SCRIPT_STR() - work as table calculations and therefore you must have an understanding of table calculations, including an understanding of how dimensions and measures affect the view. Here’s a set of links to help you get going:
The first step in doing this is to ensure you have Python installed on the machine
After this we recommend reviewing the latest install instructions on the TabPy github site that will provide you with the commands required to install TabPy
You can also find detailed install steps on our Tableau Community Tableau Python Integration Step by Step Guide
These links are helpful for getting an overview of Python and Tableau. They also will show you how to install TabPy and to integrate Python with Tableau. You will also find some example use cases with downloadable sample workbooks.
Given that the SCRIPT_*() functions work as table calculations, they require aggregate measures or Tableau parameters to work properly. Aggregate measures include MIN(), MAX(), ATTR(), SUM(), MEDIAN(), and any table calculations or Python measures. If you want to use a specific non-aggregated dimension, it needs to be wrapped in an aggregate function, such as:
SCRIPT_STR('return _arg1',ATTR([someTableauDimension]))
ATTR() is a special Tableau aggregate that does the following: IF MIN([Dimension]) = MAX([Dimension]) THEN [Dimension] ELSE * (a special version of Null) END, and it’s really useful when building visualizations and you’re not sure of the level of detail of data and what’s being sent. It can be significantly slower than MIN() or MAX() in large data sets, so once you get confident your results are accurate then you can switch to one of the other functions for performance.
For other aggregates, you need to be aware of the dimensions in the view that will affect the level of aggregation. For example, if you are passing SUM([Sales]) to Python, the value(s) of SUM([Sales]) will likely be different depending on whether you have the Year or Month in the view. In addition, if you are passing the results of separate table calculation measures to Python, then these are nested table calculations and like other nested table calculations you need to be mindful of the addressing and partitioning of the nested table calcs.
There’s a longer section about arguments further down, this was an overview.
For accurate results for the Python functions, sometimes those Python functions need to be called with every row in the underlying data. There are two solutions to this:
IF FIRST()==0 THEN SCRIPT_REAL('insert Python script here') END
If we need to then perform additional aggregations on that data, we can do so with table calculations with the appropriate “Compute Usings” that take into account the increased level of detail in the view.
For a given Python script, there will be 1+X connections to Python to execute the script, where there is always the main connection for the detail rows and X is a factor based on the # of grand totals & subtotals in the view, since grand totals & subtotals are separate computations at a higher level (more notes on that further down). For example, if there is a Row Grand total, then there will be 2 connections to Python, 1 for the detail and 1 for the Row Grand total. If there are Row and Column Grand Totals, then there will be 4 connections to Python: 1 for the detail, 1 for the Row Grand Total, 1 for the Column Grand Total, and 1 more for the combined Row & Column Grand Total.
Within the main connection for the detail rows, the Python script will be executed N times, where N is the number of partitions based on the addressing and partitioning of the Python script. Within the connections for subtotals and grand totals, the Python script is executed once for each connection.
Using Python with Tableau Grand Totals & Subtotals really highlights how grand totals and subtotals are performed as separate computations at a higher level of detail. To explain another way: if the Python script is “expecting” a certain level of detail in the arguments passed to it, and that level of detail is not in the Grand Total computation, then the Python script will generate different results.
So when we want to do custom Grand Totals, we have to be aware of the customizing Grand Total tricks for table calcs to use with our Python scripts. For more information, start with http://drawingwithnumbers.artisart.org/customizing-grand-totals-part-1/ and read more posts in http://drawingwithnumbers.artisart.org/category/grand-totals-2/. Also note that the two-pass custom Grand Totals are not currently available for Python scripts (just like they are not available for other table calculations).
The set of addresses in each Tableau table calc partition (aka rows in the partition) creates the vector for each argument passed to Python, for example _arg1 in SCRIPT_REAL('return sum(_arg1)',SUM([Sales])) that would effectively do a WINDOW_SUM(SUM([Sales])). The arguments passed to Python are always vectors. The values of the arguments are the values of each aggregate measure or Tableau parameter used for the arguments, and the length of the vector is equal to the number of rows in the partition. 'return len(_arg1)' in Python is the equivalent of the Tableau SIZE() function for a given Compute Using.
The sorting of the vectors of arguments sent to Python is based on the sort used for the addressing & partitioning. If an Advanced Compute Using with a custom sort is used, that sort also orders what is sent to Python and what comes back back. In other words, the Python code works like we’d expect it to.
Here’s an example of putting the about several paragraphs together: if you are using disaggregated measures, have 1,000 rows of data in the view and use an Python script with a Compute Using of Cell, the Python script will be executed 1,000 times within 1 connection, each time given vectors with a length of 1. Depending on your calculation, this can be significantly slower than using a different addressing with say, 1 partition, and then calling Python once with a vector of 1000 elements. If your calculation runs considerably slower in Tableau than in Python command line, most likely culprit is the addressing settings.
Null values and * (from ATTR) in Tableau are passed to Python with a value of None. If the Python script is not ready for None values then it could return undesired results.
Conversely, None values in Python that are returned to Tableau become Null values in Tableau.
In terms of data structure, Python receives each argument as a Python list.
We also need to be aware of densification (domain completion, domain padding, mark type filling) effects. When Tableau is densifying the data in the view, that can increase the number of partitions and/or the number of rows within each partition. There are a couple of straightforward ways to identify densification:
When unplanned, unexpected or unwanted densification is occurring, that might cause the Python script to return undesirable results. For example, a densified address that would ordinarily be treated as a Null in Tableau is becomes None in Python, and the Python script must be ready for that.
Going into all the other details of densification is beyond the scope of this wiki. Turning off unwanted densification is a matter of moving most or all dimensions off of Rows and/or Columns to the Level of Detail Shelf, and where necessary for layout using aggregate measures on Rows and Columns.
Here’s an example from the Tableau Forums:
At a very high level, Tableau does the following:
This has a few notable effects:
Though we can use a whole bunch of arguments (measures & Tableau parameters) for a given call to a SCRIPT_*() function, what gets returned from Python is a Tableau measure that is the result of the Python script, with one result per address (aka row in the partition). This leads to a couple of points:
*** The workaround is to concatenate the output of the Python script into a single value, then use multiple measures in Tableau to parse that out.
If the Python code returns a single value (rather than a vector), then that value will be returned for every row in the partition. If the Python code returns a matrix of values, Tableau won’t know what to do, we have to return a single value or a vector, so the workaround is to parse the matrix appropriately to end up with either a single value or a vector of single values.
In terms of data structure, Tableau expects vectors to be returned as Python lists.
The Python code needs to handle the vector(s) of the argument(s) that we’re sending to it, so, for example, if a Python function only looks at a single value of 1 argument then one can wrap the Python function like this:
SCRIPT_REAL("import math
return map(math.log,_arg1)",SUM([Sales]))
or use a list comprehension
SCRIPT_REAL("return [i * 2 for i in _arg1]",SUM([Sales]))
The SCRIPT_* functions require at least one .arg# argument and that argument must be a regular aggregate, a table calc or a parameter. If you don’t give any argument, Tableau will give an error in the Edit Calculation dialog.
As a preliminary trouble shooting step, attempt to run the code in a platform that provides a Python interpreter like PyCharm, Jupyter, Zeppelin or command line and verify that it is returning the expected value.
Another step is to create a cross-tab with all of the dimension(s) used in the view - all the dimensions from Rows, Columns, Pages, and the Marks Card - on Rows, and all of the arguments being sent to R either as discrete values on Rows and/or in a Measure Names/Measure Values cross-tab. This will give you a perspective on what aggregations Tableau is performing. For example, if you are using ATTR([myDimension]) to pass a constant string value, you might see that ATTR() is actually passing * (effectively Null) because there is more than one value of myDimension for the given level of detail of your view.
There are several possible ways to view and/or log what is being sent from Tableau to Python and back again for debugging purposes:
SCRIPT_REAL("print(_arg1)
return sum(_arg1)",SUM([Sales]))
When we start TabPy by running startup script, it will run in the console window and the process will be terminated when console window is closed. However it is possible to run TabPy as a background process or daemon.
In Linux/Mac you can do this by starting it with:
nohup ./startup.sh &
On Windows we can create a shortcut in your Startup folder like this:
C:\Windows\System32\wscript.exe C:\Users\Administrator\invis.vbs "C:\Users\Administrator\Anaconda\envs\Tableau-Python-Server\Lib\site-packages\tabpy_server\startup.bat" %*
In this shortcut:
Set WshShell = WScript.CreateObject("WScript.Shell") WshShell.Run """" & WScript.Arguments(0) & """" & sargs, 0, False
We can publish Python functions to the server and call out to them by name from Tableau instead of embedding Python code into Tableau. This would make code management much easier since we won’t have to worry about what code was embedded in what workbook and instead can rely on TabPy as the one version of the truth.
This also allows us to train supervised machine learning models using historical data (and this could be done completely outside Tableau using Python editor of choice) and apply them to new data e.g. to predict churn in Tableau.
The second example provided in the linked post below, uses a gradient boosting model trained on historical breast cancer cases to make predictions on new test result data user enters. Example also includes all Python code necessary to train/test and deploy the model to TabPy.
How to build advanced analytics applications with TabPy
Check out the example resources based on the blog about leveraging Tableau and Python for advanced analytics
We use three kinds of cookies on our websites: required, functional, and advertising. You can choose whether functional and advertising cookies apply. Click on the different cookie categories to find out more about each category and to change the default settings.
Privacy Statement
Required cookies are necessary for basic website functionality. Some examples include: session cookies needed to transmit the website, authentication cookies, and security cookies.
Functional cookies enhance functions, performance, and services on the website. Some examples include: cookies used to analyze site traffic, cookies used for market research, and cookies used to display advertising that is not directed to a particular individual.
Advertising cookies track activity across websites in order to understand a viewer’s interests, and direct them specific marketing. Some examples include: cookies used for remarketing, or interest-based advertising.