8 Replies Latest reply on Jun 11, 2018 8:42 AM by Christos Giannoulis

# Kaplan Meier Survival Test complete solution (without R)

In this article, we will explore the full solution to draw an exact Kaplan Meier Curves in Tableau. This methodology has been developed and tested by Julien Henry from our analytics team. Workbook and sample dataset are attached.

The Kaplan-Meier estimator with Tableau Software

In survival analysis, i.e. modelisation of patient lifespan within treatment groups, we need to estimate survival functions, which give us the probability of survival of these patients at any given time.

Here is an example: a study conducted on patients with leukaemia. 23 patients were randomly assigned to two groups and each group given a different treatment. We want a survival curve for each group, or rather we would like to know the probability of survival at any given moment after the beginning of the treatment.

Below is the database we will use: notice the group field takes value 1 or 2 according to the treatment group, the censor field takes value 1 if the patient died from leukaemia, 0 if the patient survived, died from any other cause or was released from the study. The first row of data can thus be interpreted as follows: a patient from the treatment group 1 who died from leukaemia (censor=1) after 9 units of time (usually months).

 time censor group 9 1 Group 1 13 1 Group 1 13 0 Group 1 18 1 Group 1 23 1 Group 1 28 0 Group 1 31 1 Group 1 34 1 Group 1 45 0 Group 1 48 1 Group 1 161 0 Group 1 5 1 Group 2 5 1 Group 2 8 1 Group 2 8 1 Group 2 12 1 Group 2 16 0 Group 2 23 1 Group 2 27 1 Group 2 30 1 Group 2 33 1 Group 2 43 1 Group 2 45 1 Group 2

This data is randomly right-censored, therefore the Kaplan-Meier estimator of the survival function is used. This is how it is built in Tableau:

In this example, we shall use the following labels:

d_i  : # of deaths due to leukaemia at time t

n_i  : # of people alive or not censored before t

S(t): the survival function

We have: S(t) = Cumulated Product (t_i<=t) (1 - d_i / n_i)

1- A first Kaplan-Meier curve within Tableau

d_i := SUM(Censor

n_i := SUM([Number of records]) + TOTAL(SUM([Number of records])) -
RUNNING_SUM(SUM([Number of records]))

Kaplan-Meier Curve : = PREVIOUS_VALUE(1)*(1-[d_i]/[n_i])

You can connect to your data using Tableau and build the straightforward calculated field:

Kaplan-Meier Dots : =

PREVIOUS_VALUE(1)*
(1 - SUM([Censor]) /
(SUM([Number of Records]) +
TOTAL(SUM([Number of Records])) -
RUNNING_SUM(SUM([Number of Records]))
))

Do not forget to convert Time to a continuous dimension. The result looks like this: You can drop Group onto colors to build the Kaplan-Meier estimator for the two groups.

This is a good start, but the curve should start at 0 with a value of 1. Moreover, the Kaplan-Meier estimator should look like a staircase, not a curve, only made of horizontal and vertical lines. It would be great to see marks at each event (death or other) in time.

2- Improved Kaplan-Meier curve

In this section, we will further improve the viz in order to build a staircase graph and add the censor events. We will also add every details specific to Kaplan Meier Curves.

First, save your Tableau worksheet (safety first!).

Re-open your Excel workbook and create a column called “link”, which is a constant field as you can see below. Moreover, create two new rows, one for each group, with time values at 0 (if your dataset doesn’t already have rows for this time value). These additional rows will enable to draw a curve starting at 100% at t=0, when tests do not start at this time point.

Then open a new worksheet (in the same workbook), that you can call “blending”, and create these three lines:

Save it, re-open the Tableau worksheet and go to Data Source (in the bottom-left corner), drag and drop “blending” next to leukaemia, click on the join and choose Left. This will not duplicate the additional rows we just added and keep the accuracy of calculations. Come back to your sheet and duplicate the Kaplan-Meier dots field to create this new one:

Kaplan-Meier curve:=
IF ISNULL(ATTR([Set])) THEN 1
ELSEIF ATTR([Set])=2 THEN [Kaplan-Meier dots]
ELSEIF ATTR([Time])!=0 AND ISNULL(LOOKUP([Kaplan-Meier dots],-1)) THEN 1
ELSE LOOKUP([Kaplan-Meier dots],-1)
END

Then create one last calculated field:

Index := Index(), compute using Time (click on Default Table Calculation). Replace Kaplan-Meier dots by Kaplan-Meier curve in the Row shelf, then drop Index onto path and add Set (as dimension) to marks. You should be visualizing the staircase as below: Now, here are the instructions in order to visualize events:

Add Kaplan-Meier dots to Rows as a shape (see the first screenshot below), then right-click on Kaplan-Meier dots and select “Dual Axis”. Remove the Measure Names field from the left shelf (see the second screenshot below), then right-click on the right axis and select “Synchronize Axis”. Here is a visualization of the Kaplan-Meier curves for the two groups: Now that you know how to build a Kaplan-Meier estimator curve, you should be able to add any confidence band you like.

3 - To verify results using R :

Our results comply with the R library “survival”. When in doubt, open R and execute:

library(survival)
data(leukemia)
surv=Surv(leukemia\$time,leukemia\$status)
surv.data=survfit(surv~1,type="kaplan-meier",conf.type="none")
surv.data.group <- survfit(surv~leukemia\$x,type="kaplan-meier",conf.type="none")
summary(surv.data)
summary(surv.data.group)

For any questions, a member of our team would be happy to assist!

• ###### 1. Re: Kaplan Meier Survival Test complete solution (without R)

Thanks for sharing!!

• ###### 2. Re: Kaplan Meier Survival Test complete solution (without R)

Thank you so much for making this available!

• ###### 3. Re: Kaplan Meier Survival Test complete solution (without R)

Hi,

Thanks for sharing. Can you please explain in details how to build your first view ? It doesn't work in your sample data if I just put "Kaplan-Meier dots" in Rows and "Time" in columns (here's what I've got instead) • ###### 4. Re: Kaplan Meier Survival Test complete solution (without R)

Raoul,

It wasn't shown in the graphic, but I think you will also need to add

-Group to the Color Shelf

-Set to the Detail Shelf (as a continuous dimension)

-Index to the Path Shelf with a Compute Using of [Time]

• ###### 5. Re: Kaplan Meier Survival Test complete solution (without R)

Hi Swaroop,

Thanks a lot. This works, but I was looking to reproduce the first viz, like so. I'm using a database as my data source so I can't do blending. Can you help me out? • ###### 6. Re: Kaplan Meier Survival Test complete solution (without R)

Raoul,

I think the problem lies in the rows at time 0 that do not have a value for [Censor].

This may be affecting the Previous() functions.

I'm not sure how best to handle the 0 time. Putting 0 in [Censor] at least produced a graph.

• ###### 7. Re: Kaplan Meier Survival Test complete solution (without R)

Thank you and Aaron Sheldon for this tutorial.  Olivier CATHERIN

I am trying to adapt it for a large national wide data set, and if there is a member for analytics team, I would like to go through nuances of this large data set.

An additional question -- Do I need to do step 2 (i.e. alter source data) to verify the results in R? Can I verify in R with the data set in step 1?

Thank you,

Neishay

• ###### 8. Re: Kaplan Meier Survival Test complete solution (without R)

Great idea and implementation as always from Catherin.

Question: Is anyone able to use that for different groups.

For instance if it is a parameter that offers display of survival by different set of groups is this possible?

Has anybody use this with the ability to select different groups?

Cheers,

Christos