5 Replies Latest reply on Nov 21, 2018 6:38 AM by Simon Runc

    Distances between points for large dataset?

    naresh.suglani.0

      Hi all,

       

      Another query based on distance between customers and shops but it isn't working for some reason despite using the usual distance calculator.

       

      The data consists of around 300k customers each linked to a specific shop and I need the distance calculated from each customer to that shop but when trying to add the distance calculator field Tableau keeps freezing.

       

      I want the output to look like this:

       

      Member Number
      Branch Code
      Distance

       

      I've attached the data and the workbook (as far as I was able to get)

       

      Thanks in advance.

        • 1. Re: Distances between points for large dataset?
          Simon Runc

          Hi Naresh,

           

          In your question you've said ...

           

          around 300k customers each linked to a specific shop

           

          But I'm not sure how you are linking a customer to a shop, as you have a full outer join on 1=1 ...a cross-join. This means that it is joining every customer with every shop. Equally in your Branches tab, each shop is repeated multiple times (although there are around 2k shops, you have 300k rows). The result of all this means that you are creating a data source which is 90 billion (yup that's not a typo!) rows ...and then running the queries live against Excel! My 32GB laptop was wurring away just trying to open the file!! Even if you de-duped your Branches sheet you'll still end up with a 600 Million row datasource.

           

          I think you need to

          a) de-duplicate the Branches

          b) get an branch id in the customer sheet, so you can link a customer to a branch.

           

          By taking your excel to a single customer I can get the calculation to work fine.

           

          • 2. Re: Distances between points for large dataset?
            naresh.suglani.0

            Hi Simon what I meant was that I have around 300k rows of data relating to 300k odd unique customers using member Id. Each of those customers have bought an item from the shop identified by the branch code.

             

            So I need the distance between both for all customers in the list. The result should be the same number of rows but with the distance field added. Does that make sense or have I confused you further?

             

            I'm guessing I went wrong first wih the join type.

            • 3. Re: Distances between points for large dataset?
              Simon Runc

              I think I see. So the Branches Data is in the same order as the Customer Data (i.e. Row 1 in Customers maps to Row 1 in Branches, Row 2 in Customers maps to row 2 in branches and so on)?

               

              In which case I'd just add a counter to each of the Tables (just counting the rows). Unfortunately in Tableau or Data Prep we don't have an option to add this. The you can just inner join on this row number. That's the easiest, low tech way. If you are familar with Panda's in Python, then you can do this very easily and combine the data (using the index to concatenate the data frames).

              2 of 2 people found this helpful
              • 4. Re: Distances between points for large dataset?
                naresh.suglani.0

                All sorted Simon, was bugging all the way home in the car so took another look at my data and thought to myself - "why didn't I just add the customer ID to both tables to link on?"

                 

                A little fiddling in Excel, and a face slap later and I'm back in business.

                 

                Hope our CEO appreciates the extra time I've put in tonight to get his "URGENT" request completed.

                 

                Thanks again SR - love this forum!

                • 5. Re: Distances between points for large dataset?
                  Simon Runc

                  Cool. Glad it all worked in the end, and you get plenty of "above and beyond" kudos from the CEO.

                   

                  I'm currently on around 3-4 face-slaps a day