0 Replies Latest reply on Oct 26, 2018 12:15 PM by Cameron Gocke

    Optimizing performance for large blended data set

    Cameron Gocke

      I have a large data set in a MySQL data store that consists of survey data and data on the members who completed the survey.  The data set with the survey questions and answers is right at 1B rows and some of the individual surveys have 200+ rows (one row for each possible answer).  We need to slice the data up based on attributes of the member demographics (age, gender, ethnicity, etc.)  Because of the size of the data set we use a Tableau Server data extract that is updated via incremental refreshes.  We currently have 2 data sources defined, one for the survey data and one for the member data.  The problem is that blending the 2 based on the member Id is waaaay too slow.

       

      The only solution I've come up with is to publish the member attributes as additional columns in the survey data source.  That is going to add a lot of columns to that data source; however, and I'm not sure that is the right/best answer.

       

      Along a similar line, there are also scenarios where the data needs to be sliced by some of the survey values, as an example of that slicing other survey answers based on values for things like blood pressure, and weight that come up as part of other survey values.

       

      Should this just be one large (wide and long) data set, or is there a better way to allow joining up one data set to the other without paying such a huge performance penalty?

       

      Here are some screenshots of example data from the survey and demographics data sets: