8 Replies Latest reply on Jun 11, 2018 8:42 AM by Christos Giannoulis

    Kaplan Meier Survival Test complete solution (without R)

    Olivier CATHERIN

      In this article, we will explore the full solution to draw an exact Kaplan Meier Curves in Tableau. This methodology has been developed and tested by Julien Henry from our analytics team. Workbook and sample dataset are attached.

       

      The Kaplan-Meier estimator with Tableau Software

      In survival analysis, i.e. modelisation of patient lifespan within treatment groups, we need to estimate survival functions, which give us the probability of survival of these patients at any given time.

      Here is an example: a study conducted on patients with leukaemia. 23 patients were randomly assigned to two groups and each group given a different treatment. We want a survival curve for each group, or rather we would like to know the probability of survival at any given moment after the beginning of the treatment.

      Below is the database we will use: notice the group field takes value 1 or 2 according to the treatment group, the censor field takes value 1 if the patient died from leukaemia, 0 if the patient survived, died from any other cause or was released from the study. The first row of data can thus be interpreted as follows: a patient from the treatment group 1 who died from leukaemia (censor=1) after 9 units of time (usually months).

       

      time

      censor

      group

      9

      1

      Group 1

      13

      1

      Group 1

      13

      0

      Group 1

      18

      1

      Group 1

      23

      1

      Group 1

      28

      0

      Group 1

      31

      1

      Group 1

      34

      1

      Group 1

      45

      0

      Group 1

      48

      1

      Group 1

      161

      0

      Group 1

      5

      1

      Group 2

      5

      1

      Group 2

      8

      1

      Group 2

      8

      1

      Group 2

      12

      1

      Group 2

      16

      0

      Group 2

      23

      1

      Group 2

      27

      1

      Group 2

      30

      1

      Group 2

      33

      1

      Group 2

      43

      1

      Group 2

      45

      1

      Group 2

       

      This data is randomly right-censored, therefore the Kaplan-Meier estimator of the survival function is used. This is how it is built in Tableau:

      In this example, we shall use the following labels:

      d_i  : # of deaths due to leukaemia at time t

      n_i  : # of people alive or not censored before t

      S(t): the survival function

      We have: S(t) = Cumulated Product (t_i<=t) (1 - d_i / n_i)

       

      1- A first Kaplan-Meier curve within Tableau


      Please read Aaron Sheldon’s solution using this link, whose formula we use in Tableau.

      d_i := SUM(Censor

      n_i := SUM([Number of records]) + TOTAL(SUM([Number of records])) -   
      RUNNING_SUM(SUM([Number of records]))

      Kaplan-Meier Curve : = PREVIOUS_VALUE(1)*(1-[d_i]/[n_i])

       

      You can connect to your data using Tableau and build the straightforward calculated field:

      Kaplan-Meier Dots : =

      PREVIOUS_VALUE(1)*
      (1 - SUM([Censor]) /
      (SUM([Number of Records]) +
      TOTAL(SUM([Number of Records])) -
      RUNNING_SUM(SUM([Number of Records]))
      ))


      Do not forget to convert Time to a continuous dimension. The result looks like this:
      KM1.png
      You can drop Group onto colors to build the Kaplan-Meier estimator for the two groups.

      This is a good start, but the curve should start at 0 with a value of 1. Moreover, the Kaplan-Meier estimator should look like a staircase, not a curve, only made of horizontal and vertical lines. It would be great to see marks at each event (death or other) in time.


      2- Improved Kaplan-Meier curve


      In this section, we will further improve the viz in order to build a staircase graph and add the censor events. We will also add every details specific to Kaplan Meier Curves.

       

      First, save your Tableau worksheet (safety first!).

      Re-open your Excel workbook and create a column called “link”, which is a constant field as you can see below. Moreover, create two new rows, one for each group, with time values at 0 (if your dataset doesn’t already have rows for this time value). These additional rows will enable to draw a curve starting at 100% at t=0, when tests do not start at this time point.

       

      time

      censor

      group

      link

      0

       

      Group 1

       

      0

       

      Group 2

       

      9

      1

      Group 1

      link

      13

      1

      Group 1

      link

      13

      0

      Group 1

      link

      18

      1

      Group 1

      link

      23

      1

      Group 1

      link

      28

      0

      Group 1

      link

      31

      1

      Group 1

      link

      34

      1

      Group 1

      link

      45

      0

      Group 1

      link

      48

      1

      Group 1

      link

      161

      0

      Group 1

      link

      5

      1

      Group 2

      link

      5

      1

      Group 2

      link

      8

      1

      Group 2

      link

      8

      1

      Group 2

      link

      12

      1

      Group 2

      link

      16

      0

      Group 2

      link

      23

      1

      Group 2

      link

      27

      1

      Group 2

      link

      30

      1

      Group 2

      link

      33

      1

      Group 2

      link

      43

      1

      Group 2

      link

      45

      1

      Group 2

      link

       

      Then open a new worksheet (in the same workbook), that you can call “blending”, and create these three lines:

      link

      set

      link

      1

      link

      2

       

      Save it, re-open the Tableau worksheet and go to Data Source (in the bottom-left corner), drag and drop “blending” next to leukaemia, click on the join and choose Left. This will not duplicate the additional rows we just added and keep the accuracy of calculations.

      KM2.png

      Come back to your sheet and duplicate the Kaplan-Meier dots field to create this new one:


      Kaplan-Meier curve:=
      IF ISNULL(ATTR([Set])) THEN 1
            ELSEIF ATTR([Set])=2 THEN [Kaplan-Meier dots]
            ELSEIF ATTR([Time])!=0 AND ISNULL(LOOKUP([Kaplan-Meier dots],-1)) THEN 1
           ELSE LOOKUP([Kaplan-Meier dots],-1)
      END

      Then create one last calculated field:

      Index := Index(), compute using Time (click on Default Table Calculation).

      KM3.png

       

      Replace Kaplan-Meier dots by Kaplan-Meier curve in the Row shelf, then drop Index onto path and add Set (as dimension) to marks. You should be visualizing the staircase as below:

      KM4.png

      Now, here are the instructions in order to visualize events:

      Add Kaplan-Meier dots to Rows as a shape (see the first screenshot below), then right-click on Kaplan-Meier dots and select “Dual Axis”. Remove the Measure Names field from the left shelf (see the second screenshot below), then right-click on the right axis and select “Synchronize Axis”.

      KM5.png

      KM6.png

      Here is a visualization of the Kaplan-Meier curves for the two groups:

      KM7.png
      Now that you know how to build a Kaplan-Meier estimator curve, you should be able to add any confidence band you like.

      3 - To verify results using R :

      Our results comply with the R library “survival”. When in doubt, open R and execute:

      library(survival)
      data(leukemia)
      surv=Surv(leukemia$time,leukemia$status)
      surv.data=survfit(surv~1,type="kaplan-meier",conf.type="none")
      surv.data.group <- survfit(surv~leukemia$x,type="kaplan-meier",conf.type="none")
      summary(surv.data)
      summary(surv.data.group)

       

       

      For any questions, a member of our team would be happy to assist!