5 Replies Latest reply on Feb 18, 2016 1:29 PM by Shinichiro Murakami

    Aggregating record data, histograms, and time ordering of events

    James Mesbur

      I'm analyzing data representing phone calls into a system. I'm trying to come up with a visualization to show what factors contribute to people calling in multiple times.

       

      We key off of the caller's account number (sanitized in the attached workbook and below) to determine whether a call is a 'repeat call' or not.

       

      If the caller calls multiple times, we'd like to be able to show:

      1. Attributes of the first call placed by a particular account (Call reason, etc.)

      2. Attributes of subsequent calls placed

      3. Aggregate on the combinations of things that happened, rather than the things themselves

       

      For example, this is intended to show the distribution of # of calls per account. So most people call 1 or 2 times, some call 3 times, and then outliers are 4 or 5 times.

       

       

      I'd like to take the next step and be able to produce calculations based on those bins of # of calls.

       

      So for example, for people calling in 2 times, what was the distribution of their call reasons? What was the most likely call reason on the 1st call to trigger the 2nd call? And so on.

       

      My data looks like this:

       

       

      I think what I need to start would be calculations showing:

       

      1. For each account #, how many times did that person call

      2. For each call, which # call was it: 1st, 2nd, etc.

       

      I can do #1 easily directly in Excel. The problem with embedding it into the source data is I end up with too many records. For example, if an account called 5 times, then for each record in the data, there will be a 5 in that column. So 5 5's. Which means I can't then aggregate on that column. I think a better structure would be a mapping of account numbers to # of calls.

       

      So instead of this (simplified):

       

      Account Number   Call #   # of calls

      12345                     1        5

      12345                     2        5

      12345                     3        5

      12345                     4        5

      12345                     5        5

       

      Something like this:

       

      Table 1

      Account Number   Call #  

      12345                     1      

      12345                     2       

      12345                     3       

      12345                     4       

      12345                     5       

       

      Table 2

      Account Number   # of calls

      12345                     5

       

      Is this something that could be represented in Tableau directly?

       

      I know this is a mish-mash of thoughts. If anyone wants to dig into my data a bit and play around to see what might be possible, I'd be very grateful for the help.

        • 1. Re: Aggregating record data, histograms, and time ordering of events
          Shinichiro Murakami

          James,

           

          This is interesting analysis.  I am curious in the result itself.

           

          Anyways,

           

          [The first Call]

          {fixed [Account #]:min([Time])}

           

          [The reason of 1st Call]

          if [Time]=[The first Call] then [Reason] END

           

          [No. of Calls 2nd time~]

          {fixed [Account #]: count (if [Time]<>[The first Call] then [Time] END)}

           

          BIN

           

           

          Thanks,

          Shin

           

          9.0 attached.

          1 of 1 people found this helpful
          • 2. Re: Aggregating record data, histograms, and time ordering of events
            James Mesbur

            村上さん、丁寧に返事を書いてくれてありがとうごあいました。

             

            I think we're partway there, but I'm not certain I understand all your calculations correctly.

             

            [The first Call] returns a time, so when I drag out account numbers I can see the corresponding data from each of the first calls placed by an account. This is great!

            [The reason of the first call] returns 'NULL' for non-1st calls, so if I exclude NULL from my view, I can get a list of all accounts, time of first calls, and reason of first calls. This is also good, but perhaps unnecessary, as the original 'Reason' gives the same results, as long as I exclude non-first calls.

             

            [No. of calls 2nd time ~] appears to return an integer for how many repeat calls an account placed. So for example if ABCD placed 5 calls, the value for [No. of calls 2nd time ~] would be 4.

             

            Now that I've seen your initial approach, which looks promising, let me see if I can be clearer about what I might need:

             

            1. Distribution of number of calls per account:

            ie.  (fake numbers)

            1 call only --> 547

            2 calls --> 323

            3 calls --> 114

            4 calls --> 24

            5 calls --> 7

             

            2. For all the 1-call-only people, distributions of the Reasons (and other variables)

             

            3. For all the 2-call people, distribution of 1st call Reason, AND for each of those Reasons, the distribution of the 2nd call Reasons:

            ie. (fake numbers)

            1st call:

            Billing - 153

                2nd call:

                      Billing - 117

                      Tech Support - 33

                      Appointment - 13

             

            1st call:

            Tech Support - 224

                 2nd call:

                      Tech Support - 196

                      Billing - 22

                      Customer Service - 6

            etc.

             

            4. For all the 3-call people, same thing, perhaps using 2nd+3rd call reasons as a single value:

            ie. (fake numbers)

            1st call:

            Billing - 55

                2nd call+3rd call:

                      Billing+Billing - 42

                      Billing+Tech Support - 8

                      Tech Support+Billing - 5

             

            etc.

             

            I hope this makes sense!

             

            I've created a calculation to filter out 'singleton' accounts, ie. those that only called once. It seems to work:

             

            [Is Singleton]

            {fixed [Account #]:min([Time])=max([Time])}

             

            Thanks in advance for any additional help you can provide!

            • 3. Re: Aggregating record data, histograms, and time ordering of events
              Shinichiro Murakami

              Thank you for your kind reply.

              Except one typo

              丁寧に返事を書いてくれてありがとうご いました。

               

              Anyway, basic data is already there, so you can start analyze the triggered reason, number of calls , etc whatever you need.

              If you have further questions, please post them again.

               

              BTW, could put your mart "Correct Answer" or "Helpful" , which encourages me.

               

              Thanks,

              Shin

              • 4. Re: Aggregating record data, histograms, and time ordering of events
                James Mesbur

                haha thanks for the correction! My finger must have slipped :-P

                 

                I marked your initial answer as 'helpful' for now. It's much appreciated! I'll work hard to use your calculations to build what I need, then hopefully it'll be 'correct' soon :-)

                • 5. Re: Aggregating record data, histograms, and time ordering of events
                  Shinichiro Murakami

                  James,

                   

                  Thank you.  I think you can enjoy your time to study these logic.

                  LOD is fun to play with

                   

                  Thanks,

                  Shin