1 Reply Latest reply on Aug 16, 2016 8:41 AM by Kent Sloan

    Reshaping data to improve performance (large data tables)?

    Lance Dacey

      The database that I am working with is fairly large. There are about 2 million new rows coming in per calendar date and I try to keep some detailed data for a couple of months to a quarter (still tweaking this).

       

      The current shape of the data is extremely wide though, I have tables that are 200-500 columns (I did not create this database, but I can help reshape it for Tableau).

       

      dateemployee_id
      channel
      interval
      (foreign_keys)
      talk_timehold_timestaff_timeinbound_call_count
      outbound_call_countabandon_call_count

       

      • Data is per 30 minute interval (24 in a day)
      • There could be 10-30 channels per employee per interval
      • There are many thousands of employees (so now we are at 10-30 channels times 24 intervals times 30,000+ employees)
      • Most of the columns (minimum of 50+) are either about call counts or time duration (seconds)

       

      Will I gain performance if I reshape this to something like:

       

      dateemployee_idchannel
      interval
      (foreign_keys)
      category
      value
      talk_time320
      hold_time411

       

      My concern is that this will cause a massive number of rows, although it will be a much narrower table.

       

      Most of my formulas are something like SUM(talk_time) + SUM(hold_time) / SUM(staff_time). Switching to a tall format would require me to add a lot of IF statements to refer to the category column. Will this perform well or slow things down even more?

        • 1. Re: Reshaping data to improve performance (large data tables)?
          Kent Sloan

          Hi Lance,

           

          In general Tableau runs much faster when using tall narrow data sets over short and wide ones. Having fewer columns also makes it easier to use in Tableau. The first thing I would do before flipping all the content is check to see what you actually need to use in your reports. Leaving unnecessary content behind can save space and help speed things up. Looking at the calculation you need to use I would lean towards keeping those columns side by side rather than summing up separate chunks of the same column to divide against each other. I would also consider if there are columns that can be combined before bringing in Tableau or if the data set can be rolled up to the hour or day level. Without seeing more of the data set all I can honestly give are general guidelines. 

           

          Thanks,

          Kent Sloan

          InterWorks

          1 of 1 people found this helpful