Kaplan Meier Survival Test complete solution (without R)
Olivier CATHERIN May 18, 2015 1:41 AMIn this article, we will explore the full solution to draw an exact Kaplan Meier Curves in Tableau. This methodology has been developed and tested by Julien Henry from our analytics team. Workbook and sample dataset are attached.
The KaplanMeier estimator with Tableau Software
In survival analysis, i.e. modelisation of patient lifespan within treatment groups, we need to estimate survival functions, which give us the probability of survival of these patients at any given time.
Here is an example: a study conducted on patients with leukaemia. 23 patients were randomly assigned to two groups and each group given a different treatment. We want a survival curve for each group, or rather we would like to know the probability of survival at any given moment after the beginning of the treatment.
Below is the database we will use: notice the group field takes value 1 or 2 according to the treatment group, the censor field takes value 1 if the patient died from leukaemia, 0 if the patient survived, died from any other cause or was released from the study. The first row of data can thus be interpreted as follows: a patient from the treatment group 1 who died from leukaemia (censor=1) after 9 units of time (usually months).
time  censor  group 
9  1  Group 1 
13  1  Group 1 
13  0  Group 1 
18  1  Group 1 
23  1  Group 1 
28  0  Group 1 
31  1  Group 1 
34  1  Group 1 
45  0  Group 1 
48  1  Group 1 
161  0  Group 1 
5  1  Group 2 
5  1  Group 2 
8  1  Group 2 
8  1  Group 2 
12  1  Group 2 
16  0  Group 2 
23  1  Group 2 
27  1  Group 2 
30  1  Group 2 
33  1  Group 2 
43  1  Group 2 
45  1  Group 2 
This data is randomly rightcensored, therefore the KaplanMeier estimator of the survival function is used. This is how it is built in Tableau:
In this example, we shall use the following labels:
d_i : # of deaths due to leukaemia at time t
n_i : # of people alive or not censored before t
S(t): the survival function
We have: S(t) = Cumulated Product (t_i<=t) (1  d_i / n_i)
1 A first KaplanMeier curve within Tableau
Please read Aaron Sheldon’s solution using this link, whose formula we use in Tableau.
d_i := SUM(Censor)
n_i := SUM([Number of records]) + TOTAL(SUM([Number of records])) 
RUNNING_SUM(SUM([Number of records]))
KaplanMeier Curve : = PREVIOUS_VALUE(1)*(1[d_i]/[n_i])
You can connect to your data using Tableau and build the straightforward calculated field:
KaplanMeier Dots : =
PREVIOUS_VALUE(1)*
(1  SUM([Censor]) /
(SUM([Number of Records]) +
TOTAL(SUM([Number of Records])) 
RUNNING_SUM(SUM([Number of Records]))
))
Do not forget to convert Time to a continuous dimension. The result looks like this:
You can drop Group onto colors to build the KaplanMeier estimator for the two groups.
This is a good start, but the curve should start at 0 with a value of 1. Moreover, the KaplanMeier estimator should look like a staircase, not a curve, only made of horizontal and vertical lines. It would be great to see marks at each event (death or other) in time.
2 Improved KaplanMeier curve
In this section, we will further improve the viz in order to build a staircase graph and add the censor events. We will also add every details specific to Kaplan Meier Curves.
First, save your Tableau worksheet (safety first!).
Reopen your Excel workbook and create a column called “link”, which is a constant field as you can see below. Moreover, create two new rows, one for each group, with time values at 0 (if your dataset doesn’t already have rows for this time value). These additional rows will enable to draw a curve starting at 100% at t=0, when tests do not start at this time point.
time  censor  group  link 
0 
 Group 1 

0 
 Group 2 

9  1  Group 1  link 
13  1  Group 1  link 
13  0  Group 1  link 
18  1  Group 1  link 
23  1  Group 1  link 
28  0  Group 1  link 
31  1  Group 1  link 
34  1  Group 1  link 
45  0  Group 1  link 
48  1  Group 1  link 
161  0  Group 1  link 
5  1  Group 2  link 
5  1  Group 2  link 
8  1  Group 2  link 
8  1  Group 2  link 
12  1  Group 2  link 
16  0  Group 2  link 
23  1  Group 2  link 
27  1  Group 2  link 
30  1  Group 2  link 
33  1  Group 2  link 
43  1  Group 2  link 
45  1  Group 2  link 
Then open a new worksheet (in the same workbook), that you can call “blending”, and create these three lines:
link  set 
link  1 
link  2 
Save it, reopen the Tableau worksheet and go to Data Source (in the bottomleft corner), drag and drop “blending” next to leukaemia, click on the join and choose Left. This will not duplicate the additional rows we just added and keep the accuracy of calculations.
Come back to your sheet and duplicate the KaplanMeier dots field to create this new one:
KaplanMeier curve:=
IF ISNULL(ATTR([Set])) THEN 1
ELSEIF ATTR([Set])=2 THEN [KaplanMeier dots]
ELSEIF ATTR([Time])!=0 AND ISNULL(LOOKUP([KaplanMeier dots],1)) THEN 1
ELSE LOOKUP([KaplanMeier dots],1)
END
Then create one last calculated field:
Index := Index(), compute using Time (click on Default Table Calculation).
Replace KaplanMeier dots by KaplanMeier curve in the Row shelf, then drop Index onto path and add Set (as dimension) to marks. You should be visualizing the staircase as below:
Now, here are the instructions in order to visualize events:
Add KaplanMeier dots to Rows as a shape (see the first screenshot below), then rightclick on KaplanMeier dots and select “Dual Axis”. Remove the Measure Names field from the left shelf (see the second screenshot below), then rightclick on the right axis and select “Synchronize Axis”.
Here is a visualization of the KaplanMeier curves for the two groups:
Now that you know how to build a KaplanMeier estimator curve, you should be able to add any confidence band you like.
3  To verify results using R :
Our results comply with the R library “survival”. When in doubt, open R and execute:
library(survival)
data(leukemia)
surv=Surv(leukemia$time,leukemia$status)
surv.data=survfit(surv~1,type="kaplanmeier",conf.type="none")
surv.data.group < survfit(surv~leukemia$x,type="kaplanmeier",conf.type="none")
summary(surv.data)
summary(surv.data.group)
For any questions, a member of our team would be happy to assist!

leukaemia.xlsx 9.7 KB

KaplanMeierleukeamia.twbx 21.7 KB