2 Replies Latest reply on Jan 8, 2015 10:51 AM by elyssia.clark

    Re-shaped data impacting calculated fields despite using CountID function

    elyssia.clark

      I have a survey with general likert scale questions and then an open ended question.  Every survey respondent has a unique ID which is a mix of letters and numbers.

        

      I've built a codeframe for the open ended question to summarise the insights.  So let's say the question is "What did you like about your experience today?"  And say the respondent answered "The customer service was wonderful and the turn-around time was fast!".  Each survey respondent is allocated two response options - that is, we record two coded responses per person as appropriate.  In this example, the individual will be allocated code "1" for "Excellent / good customer service" and code "3" for "Good / fast turnaround time".

       

      I've then created a separate excel spreadsheet and reshaped the data, and in this table each survey respondent has two rows - giving an answer (or a blank) to each of the two response fields.

       

      In Tableau, I then join this spreadsheet to the master spreadsheet (containing the likert scale questions) using a left join.   I then easily created a summary table showing the % who mentioned each code for my open ended question.

        

      The problem is that there has been an unexpected knock-on effect on some calculations on my likert scale questions. 

      My master file includes a calculated field which is an 'if/then' statement.  For example, IF [Question1]=9 then 1

      elseif [Question1]=10 then 1.

       

      Since creating this join, I've found this calculated field is now double counting records which have reshaped data, and as a result my outputs from the calculation are wrong.  I keep trying to allocate the "COUNTD" function but it's not making a difference. How do I adjust the If/then statement to only calculate on unique survey IDs ?

       

      Any insights would be gratefully received.

        • 1. Re: Re-shaped data impacting calculated fields despite using CountID function
          Jim Wahl

          There are a few ways to work around the problem of duplicated rows in "stacked" data.

           

          AdvantagesDisadvantages

          Update dataset to include a "response_row" that is 1 for the first row, 2 for the second, ....

          • For views that don't include the multi-response questions, you can add a filter to the worksheet: response_row=1, preferably as a context filter. Context filters are evaluated before sets and other regular filters; right click the filter > Add to Context.
          • If a view contains both, you can update your calculated fields; for example Number of Records = SUM(IF [response_row] == 1 THEN 1 ELSE NULL END)
          • Single data source
          • Need to remember to add the filter.
          • Or update all measures with the IF response_row == 1, ... (safer but more work)

          Use a separate data source for the multi-response questions, ...

          • You'd use your original dataset as is, and the new dataset when you need the multi-response questions in the view.
          • You could either 1) create a full data set with both types of questions, which will make it easier to put both types of questions in the view 2) create a data set with just the multi-response questions (safer, but more difficult to blend).
          • Filters aren't normally required
          • Need to maintain two data sources.

          Use table calcs to select just the first row on these questions

          • IF FIRST()==0 with addressing / compute using set to respondent ID, partitioned by question ID and all other dimensions.
          • You don't have to modify your data source.
          • Table calcs are complicated
          • And slow on large datasets
          • Addressing / compute using needs to be verified on each view
          • If you're using other table calcs, you now have nested table calcs, which add complexity

           

          I usually go with the second option, and use this data source only on the views with multi-response questions. I don't trust myself to remember that I need to add filters, especially if I haven't opened the workbook in six months. 

           

          On the other hand, if you have just a few measures / calculated fields you're using, the first option can work well.

           

          Jim

          • 2. Re: Re-shaped data impacting calculated fields despite using CountID function
            elyssia.clark

            Jim I am sincerely grateful for your advice - thank you !  I think I might go with the first option given I've only got a couple of calculated fields.  You've reminded me that I should have also mentioned I don't have duplicated records for every single survey respondent.  This is because I'm combining data from a 8 different surveys into one master file, and the open ended question was only asked in 6 of those surveys.  More specifically, only 2/3 of survey respondents have multiple rows.  But I can work around this in excel using an if / then statement to determine the appropriate text in the new "response_row" field.  I'll let you know how I go - and thank you again.