8 Replies Latest reply on Aug 19, 2012 3:29 AM by Jonathan Drummey

    How do I identify the number of first time riders on a month by month basis in a bar graph?

    William Weaver

      I have created a bar graph that displays the total number of unique transit riders on a month by month basis for a 6-month period. Now I need help identifying which of those unique riders are also first-time riders. The challenge is actually two-fold. First, I need to identify the first time riders in January 2012 by comparing the unique customer id for each rider in January 2012 against the unique ids for all of the riders in 2011. Next, I need to compare the customer ids for riders in February 2012 against the customer ids for riders in January 2012 and all of 2011. Then, I need to repeat the process for March, April, May and June. And finally, I would also like to color code and count the first time riders inside my existing bar graph.

       

      Originally, I thought I could accomplish the task by creating a filter that excluded all of the customer ids that were associated with a ride in 2011 and then applying that filter to the list of customer ids with trips in January 2012 but I could never get it work. Plus, I wasn't sure how I could build upon the exclusion filter for successive months.

       

      I have attached a packaged workbook of my efforts if anyone wants to give this challenge a try.


      Thanks - William

        • 1. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
          Jonathan Drummey

          Hi William,

           

          I can think of a couple solutions to this, one question though - what are your production volumes like per month/year? Table calculations can be used for smaller volumes, custom SQL is recommended for larger volumes and generally better performance.

           

          Jonathan

          • 2. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
            William Weaver

            Are you talking about the total number of trips taken by all customers within a timeframe or the number of likely new riders each month?

             

            If it is the former, then approximately 142,465 trips were taken by 2471 unique riders in 2011. If it is the latter, then my best guess-estimate is that the agency adds between 20 and 50 new riders per month.

             

            Have I answered your question?

             

            Thanks --- William

            • 3. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
              Jonathan Drummey

              Hi William,

               

              This was fun! I'd been wanting to do some testing of this kind of cohort analysis using a moderately larger data set than the Superstore Sales data, to find out whether table calcs would be fast enough to be useful, and they aren't. The "Table calc (slow)" view in the attached workbook takes over a minute to refresh on my machine, and I've done every optimization I can think of. The challenge is that in order to determine the first trip for every rider, Tableau needs the whole data set available and has to perform that calculation for every trip, plus the subsequent table calculations to generate the results. That's a whole lot of calculations.

               

              The faster way is to use Custom SQL, where there's a subquery that finds the first trip date for each rider, then joins that back to the trip info for each rider. Here's my shortened version of the SQL:

               

              SELECT [DATA].[CustomerID] AS [CustomerID],

                [DATA].[TripDate] AS [TripDate],

                [DATA1].[TripDate] AS [FirstTripDate]

              FROM [DATA]

                INNER JOIN (SELECT [DATA].[CustomerID] AS [CustomerID],

                    MIN([DATA].[TripDate]) AS [TripDate]

                    FROM [DATA]

                    GROUP BY [DATA].[CustomerID]) AS [DATA1]

                  ON [DATA].[CustomerID] = [DATA1].[CustomerID]

               

              Having the FirstTripDate now available in the data not only speeds things up by removing the need for all the table calculations, but also allows Tableau to materialize the results of the calculations in the data extract, so there are sub-second updates in the "Custom SQL (fast)" view.

               

              I had to do a bit of ETL work to pull this off: Your workbook used a Tableau Data Extract, so I used Tableau's Worksheet->Export->Data function to export the CustomerID and TripDate fields into an Access .mdb file, then reimported that twice. Once to create the data source for table calcs, the second for the Custom SQL. Hopefully you can apply the Custom SQL technique to your original data source.

               

              Cheers,

               

              Jonathan

              • 4. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
                William Weaver

                Hi Jonathan,

                 

                This is fantastic. It accomplishes a goal that I was, admittedly, not very optimistic was even possible in Tableau. That being said, I do, of course, have a question or two about how I can apply your solution locally.

                 

                First, since I need to the ability to update the graph based on real-time data, how do I redirect your custom SQL script away from a static Microsoft Access database and back to my SQL database? Will I need to create a new database table for both of your "DATA" and "DATA1" references in the SQL script?

                 

                Second, is your solution date independent? By that, I mean, will the solution automatically adjust to whatever date range of data is available on the other end of the data connection?

                 

                Thanks --- William

                • 5. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
                  Jonathan Drummey

                  Hi William, 

                   

                  I was expecting that you would adapt the Custom SQL to whatever your database is. The basic process is:

                   

                  In the Tableau data sources window, right-click on your data source and choose Edit Connection...  A [Data Source Type] Connection dialog appears.

                  Most likely the connection just has the Single Table radio button checked, you will check Custom SQL and see the text window below change into a bunch of SQL Code:

                  screenshot2.jpg

                  Click on the ... button to go to a larger window.

                  Now comes the tricky bit if you are not familiar with SQL syntax. You will need to add rows to the SELECT portion of the SQL statement to add the FirstTripDate, and add everything from the INNER JOIN onwards of your statement. The [DATA] will be replaced by the name of your data source (whatever is showing up in brackets on the FROM line), if you want you can change all the instances of [DATA1] to be something more to your liking. The SQL here is JET syntax, if you are using a different database then there can be variations.

                   

                  If you still need help, send me an email (see my info) and we can set up a webex.

                   

                  Jonathan

                  • 6. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
                    Jonathan Drummey

                    I realized I didn't answer your second question. The way the query is written, it will find the FirstTripDate by pulling the MIN(Trip Date) for each CustomerID from the database, in other words it will find the very first trip for each customer in the database. If you want to change the definition of what the first trip is, you'd probably add a WHERE clause to the subquery (the second SELECT statement) to limit what counts as a first trip.

                     

                    Jonathan

                    • 7. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
                      William Weaver

                      I would like to take you up on your offer for a webex session but your profile does not include your email address. Here is my email address rtweaver / bell / south / ne / t. Just get rid of the slashes and spaces and add the at symbol between the r and the b and a dot between the h and the n. Apologies for being cryptic but I just want to avoid adding to my spam email.

                       

                      Thanks --- William

                      • 8. Re: How do I identify the number of first time riders on a month by month basis in a bar graph?
                        Jonathan Drummey

                        Hi William,

                         

                        I apologize, I thought I'd made my email address available, it's viewable now. I'll send you an email separately as well.

                         

                        Jonathan