We are working on a challenge for a fictitious theme park, and I'm struggling a bit to get something sorted the way we need. We have ~26 million rows of data about movements in the park, and we need to visualize and classify groups (e.g. people who enter, ride rides, and eat at the same time).
I have the timeline in the format that’s useful for us, but I can’t get it to sort completely correctly. I’ve followed various directions for nested sorting, and some other methods, but can’t get it quite there. As the image below shows, there are groups of 3-4 people who are sorted properly, but a few lines a few more users with the same check-in times appear, but not sorted with the group.
I suspect I’m missing a logical structure, or perhaps have constructed it incorrectly. Any suggestions are appreciated. I’ve linked below (too large to attach) a packaged workbook with just the visualization in question and the data...88mb since it’s a large data set.
One approach to sorting visitors
is to prepare a "Timestamp Hash"
(well, a sort of) by concatenating
timings between check-ins
(or from the first one) for each Id.
Please find the link to the workbook:
Hope it helps a bit.