3 Replies Latest reply on Jun 3, 2012 6:10 PM by Dimitri.B

    data file structure, repeating data


      I'm working with a large file that has binary indicators for group membership. After reading Joe Makko's helpful tip regarding Tableau preferring files that are taller rather than wider, I attempted to restructure my data. Now, instead of binary indicator variables, I have repeating cases.


      However, the repeating cases throws off counts. For example, if I am trying to determine the total sales we have for a client (and I have a collum with total sales value) it aggregates the sales amounts from all cases and doesnt just use the total sales.


      I'm working right now will at least 100 variables and 500k entities (before restructuring) so included a small sample file of the restructure that has the problem. Any advice for doing this better?


      Note - I recognize that Tableau could calculate the total for me. This is simply an example of repeating data fields that was easy to construct. We also have measures of contact with the individual (Who contacted, when, and via what method) where this occurs as well.


      Thank you!

        • 1. Re: data file structure, repeating data

          I am not sure if understand the problem correctly, but in your data the Total Sales is redundant. As you correctly noted, Tableau can calculated it from Sales.

          I understand that the attached data is just a sample, therefore it would be difficult to comment on how to best structure it, especially considering that we don't know what the output should look like.

          Generally speaking, Joe is right, but that is not to say that everything has to go into rows. From my experience restructuring data is trial and error process and eventually you find what works best in rows and what should stay in columns.


          In your example, just use Sales instead of Total Sales in you view and the problem disappears.

          • 2. Re: data file structure, repeating data

            I've added additional information in this new sheet.


            The correct viz in sheet two would say report that there was 1 "yes" instead it's counting George as having two yes, when in reality, George should only have one yes.


            In the 3rd viz, you can see a cumulative effect of this, there it reports five positive interactions, when really there were only two people who had the positive interaction.

            • 3. Re: data file structure, repeating data

              Well, according to the data structure, George has two 'yes' entries for Contact type 1 - one for Minor Pool A and one for Minor Pool B.

              If this doesn't reflect what really happened, then you should either restructure your data or use conditioning logic in calculations (IF - THEN - ELSE).

              To use IF-THEN approach, one needs to understand the meaning of data, but, for example, you can use something like:


              SUM(IF [Minor Pool ] = 'Minor pool A' THEN 1 ELSE 0 END)


              which will only count Minor Pool A entries, etc.