9 Replies Latest reply on Mar 1, 2014 10:33 PM by Kyle Campbell

    Grabbing only the most recent or maximum value by user from a log file

    Kyle Campbell

      Hi Everyone,

      I have a log file for a mobile game that gives me the player's experience (and some other info) every time they play an encounter in the game.

      Simplistically, it looks like this:

      timeuseridencounterplayer level
      1/1/14 1:00 PMATutorial1
      1/1/14 2:00 PMAEasy1
      1/1/14 3:00 PMAMedium2
      1/1/14 12:00 PMBTutorial1
      1/2/14 9:00 AMBEasy2
      1/2/14 11:00 AMCTutorial1
      1/1/14 3:00 PMDTutorial1
      1/2/14 1:00 PMDEasy1
      1/2/14 2:00 PMDMedium2
      1/2/14 3:00 PMDHard3

       

      What I want to do is plot the number of unique users in each level by day. I would like to only take the maximum player level (or equivalently, the most recent). Taking raw record counts won't work as a unique user has multiple records. The output would be this (in pretty graph form):

       

      DateLevel# Users
      1/1/1412
      21
      1/2/1411
      21
      31

       

      Of course, I'd like this to stay dynamic in Tableau so I could look at distribution by hour, week, whatever. Also, I'll probably have other log files which will require me to do other generic functions over the day beyond just doing the maximum of a field by userid.

       

      I hope this explains what I'm trying to accomplish. Thanks!

        • 1. Re: Grabbing only the most recent or maximum value by user from a log file
          Phillip Burger

          Attached is one solution. The approach I took was to use a count distinct.

           

          I added additional test records. Here's the set of records used for this workbook:

           

          time    userid    encounter    playerLevel

          1/1/14 1:00 PM    A    Tutorial    1

          1/1/14 2:00 PM    A    Easy    1

          1/1/14 12:00 PM    B    Tutorial    1

          1/1/14 3:00 PM    D    Tutorial    1

          1/1/14 3:00 PM    A    Medium    2

          1/2/14 11:00 AM    C    Tutorial    1

          1/2/14 1:00 PM    D    Easy    1

          1/2/14 2:00 PM    D    Medium    2

          1/2/14 9:00 AM    B    Easy    2

          1/2/14 3:00 PM    A    Hard    3

          1/2/14 3:00 PM    D    Hard    3

          1/3/14 9:00 AM    A    Easy    3

          1/3/14 10:00 PM    A    Hard    3

          1/3/14 11:00 PM    A    Hard    3

           

          I'd use a bar chart for the results when looking at daily counts and when the number of days is not too many.

           

          I look forward to other ideas from the community, perhaps a solution with a table calculation or a calculated field that uses other functions. To help spur other solutions, I'm also attaching the source data I used (same as what's posted above).

          1 of 1 people found this helpful
          • 2. Re: Grabbing only the most recent or maximum value by user from a log file
            Kyle Campbell

            That's definitely better than the solution I thought of which was having connecting to the data through a Custom SQL which would pair down the data. I like that with your method the raw data doesn't need to be modified or truncated.

            • 3. Re: Grabbing only the most recent or maximum value by user from a log file
              Kyle Campbell

              While this is quite helpful (thank you!), it isn't quite the solution I was shooting for. The only difference is that if a user was both level 1 and level 2 in a given day, then he is counted twice (once at level 1 and once at level 2). I would like to only count him once at level 2 (their maximum/ending level on that day).

              • 4. Re: Grabbing only the most recent or maximum value by user from a log file
                Phillip Burger

                I can't figure out how to do this. I tried obtaining the max player level within the partition but wasn't able to complete this step. There is still the follow up step of counting the number at that level for the day. A workaround is to reshape the data.

                 

                There are many, very accomplished Tableau users on this board. I hope one of them reads this thread and is able to provide a solution! ;-)

                • 5. Re: Grabbing only the most recent or maximum value by user from a log file
                  Aaron Clancy

                  The only way I could think to approach this was with custom SQL and parameter passing.

                  I set up a parameter that groups the query by the selected level.  (Month,day, or year)

                  This parameter could be modified to account for other groupings as well but I kept it to the three that you specified.

                   

                  This is the data I used:

                  Screen Shot 2014-02-27 at 9.59.54 AM.png

                   

                  I created a parameter called MDY:

                  Screen Shot 2014-02-27 at 10.00.31 AM.png

                  Then I changed the query:

                  SELECT   max([input-data-107214#txt].[playerLevel]) AS [playerLevel],

                    format([input-data-107214#txt].[time],<Parameters.MDY>) AS [Date],

                    [input-data-107214#txt].[userid] AS [userid]

                  FROM [input-data-107214#txt]

                  group by format([input-data-107214#txt].[time],<Parameters.MDY>),[input-data-107214#txt].[userid]

                   

                  The above query has a dynamic grouping so that you only get 1 record back for each player (max level) for each month, day or year (whatever the parameter is)

                  THIS REQUIRES A LIVE CONNECTION

                   

                  You'll need to convert the Date field to a Date Type.

                   

                  The viz will look like this when Day is selected:

                  Screen Shot 2014-02-27 at 10.03.11 AM.png

                   

                   

                  When I switch to Month it looks like this:

                  Screen Shot 2014-02-27 at 10.04.06 AM.png

                   

                  It may not be the best approach but it was all I could come up with.

                  1 of 1 people found this helpful
                  • 6. Re: Grabbing only the most recent or maximum value by user from a log file
                    Phillip Burger

                    Inspiration goes out to Aaron. He's on the right track and his approach fits within the Tableau workflow for a problem like this. Here's a slightly different, or derivative, solution to what Aaron submitted. This uses a filter for the data rather than a parameter. To make the JET SQL simple, I hacked the data by removing the time component from the datetime field.

                     

                    This solution works using either an extract or a live connection. The custom JET SQL here is same as Aaron's but without the format().. 


                    SELECT [input-data-1-107214#txt].[time] AS [time],

                      [input-data-1-107214#txt].[userid] AS [userid],

                      MAX([input-data-1-107214#txt].[playerLevel]) AS [maxPlayerLevel]

                    FROM [input-data-1-107214#txt]

                    GROUP BY [input-data-1-107214#txt].[time],

                      [input-data-1-107214#txt].[userid]

                     

                    I changed the date to a continuous dimension and used a filter. As continuous, you could slide the filter value to obtain results for different ranges of dates and times. The mark is the sum of records.

                     

                    A disadvantage to my solution included in this reply is that is uses the values from the maxPlayerLevel as the values of the playerLevel dimension. This could be bad design in many instances.

                     

                    This was a hard problem. I think these are the best tips without reshaping the data before connecting to it with Tableau.

                    1 of 1 people found this helpful
                    • 7. Re: Grabbing only the most recent or maximum value by user from a log file
                      Aaron Clancy

                      I like the approach.

                      I'm just a little confused by how you apply different date grouping and still keep the count distinct.

                      One of the requirements Kyle has is to be able to change the date level that the distribution happens at.

                      If the end user wants to see unique player counts for the month instead of by day then I don't see how that would be possible after a query that is grouped by day and pushed into Tableau.

                       

                      Not sure if I misread that requirement.

                      • 8. Re: Grabbing only the most recent or maximum value by user from a log file
                        Phillip Burger

                        Kyle's requirements are contradictory. There is What I want to do is plot the number of unique users in each level by day and then there is I'd like this to stay dynamic in Tableau so I could look at distribution by hour, week, whatever.


                        Your answer is good and I gave you credit for it in my reply. My latter addition to the thread was derivative and I admit as much. Yet, having a filter by day is one more, helpful way to view count information over time.

                         

                        Kyle, please mark the question as Answered if it is answered. Also, please mark one or more helpful responses as you see fit. These steps help build the long-term quality of the board and prevent other member of the Community from spending time on answered questions.

                        • 9. Re: Grabbing only the most recent or maximum value by user from a log file
                          Kyle Campbell

                          I'm not sure how those are contradictory, but Helpful Answers are marked.