12 Replies Latest reply on Jan 30, 2014 7:43 AM by Jessica Lee

    density map & clustering

    Jessica Lee

      Hi all,

       

      I created a map that shows all the starbucks store within certain distance of selected startbucks store.

      (Data files and a workbook is attached)

      I saw many examples with writing customer SQL function, but somehow I kept getting error message.

      I am practicing with this data for now, but I will implement this formula to actual database later on.

      Rather than using custom SQL, I thought using a table calculation would be more reasonable.

      From my previous discussion, Matt Lutton gave me an answer to this question.

       

      So, I successfully generated a dashboard showing all starbucks store with certain distance from the selected store.

      But now I'd like to move on to the next step.

       

      I saw several posts by Richard Leeke & Elvis Has Left The Building about map & geocoding.

      I really liked Richard's work on creating concentric circles.

      http://community.tableau.com/thread/117325

       

      By looking through this threads and his workbook, it seems like he set the radius within the excel file.

      This is just my assumption, but I am not quite sure how radius parameter is created.

       

      So now here comes my questions.

       

      1. Can I set radius using the distance formula I already created? like 0.5 mile, 1 mile, 1.5 mile, 2 mile radius ?

      2. After creating radius, how should I write a function to count the stores within set radius?

       

      Can any tableau expert help me with this?

       

      Thank you.

        • 1. Re: density map & clustering
          Richard Leeke

          Just catching up on some threads I'd flagged to look at when I got back to a PC (saw this and your earlier question come through on my phone).

           

          From reading this and your earlier thread I think I know what you are trying to do - but let me just restate the question as I understand it to make sure we are not talking at cross purposes.

           

          As I understand it, you are trying to count the number of Starbuck's stores within a given radius of each store. You want to be able to specify the radius as a parameter.

           

          "Each" is the crucial word here. If you just want to count the number of stores that are within the radius of the single store which you have selected, that is straightforward to do using a number of different techniques for radius calculations. The table calculation approach Matthew Lutton pointed you at is probably the easiest, but you can also do it with blending or other ways. Extending what you have to give the count for the selected store would be easy.

           

          But if you do want to do the full analysis of each store with respect to all the others, it gets a lot more difficult.

           

          I have long maintained that the set of addressing functions available within Tableau table calculations is incomplete - the functions available (INDEX(), FIRST(), LAST() etc) don't allow you to do calculations which allow each row to be compared with each other row.  The only ways I know of to achieve that involve using either custom SQL or RAW_SQL expressions (either using cross products or correlated sub-queries).

           

          However, I do vaguely remember one of the real table calculation gurus (Joe Mako, Jonathan Drummey, etc) writing something recently about a technique which allows you to do that type of analysis with table calculations - though I suspect it involved understanding the deep, dark secrets of table calculations and I can't for the life of me remember how it worked. Maybe Joe or Jonathan will jump in with a pointer to that.

           

          I've just tried to hunt out any old threads discussing this type of problem - I'm sure it's come up a lot, but I can't find any simple explanations of the SQL techniques required.

           

          There is a long and quite complex discussion of this type of analysis in this old thread - though much of this is about how to optimise the performance to cope with databases with millions of rows by using spatial database technology such as spatial indexes. Your volumes wouldn't need that unless you have a lot more data than in the sample you posted.

           

          All you would need to do is use a RAW_SQL expression to calculate the number of stores within the given radius of each store.

           

          I just dug out an old workbook which does something similar to calculate the population density around each of the post code locations in Germany. This takes a table of around 8,000 locations, and their corresponding populations and displays a mark for each of the 8,000 locations coloured according to the total population of all locations within 25 kilometres of the mark.

          Population within 25 km (spatial index).png


          The population is returned by a calculated field using this RAW_SQL expression:


          RAWSQLAGG_INT("

          SELECT SUM([Population])

          FROM [Locations] AS [L2]

          WHERE

          ROUND(6371 * ACOS(SIN(%1 * PI() / 180) * SIN([L2].[Latitude] * PI() / 180) +

                            COS(%1 * PI() / 180) * COS([L2].[Latitude] * PI() / 180) *

                            COS(([L2].[Longitude] * PI() / 180) - (%2 * PI() / 180))),2)

          <= %3

          ", [Latitude], [Longitude], [Radius])

           

          As you just want a count of stores rather than the total population you would just need to replace SUM([Population]) with COUNT(*). The exact SQL syntax will depend on what database you are using. This example is for SQL Server.

           

          Hope that helps.

          • 2. Re: density map & clustering
            Richard Leeke

            By the way - I remembered where I'd seen the description of comparing all rows with themselves without using SQL - it's in this post on Jonathan's blog - and makes use of densification.

             

            But unless I'm missing something I can't see how to use that approach here. As far as I can see that approach requires the store identifier on rows and columns, giving a tabular layout. As soon as you move them off onto other shelves the densification stops happening.

            • 3. Re: density map & clustering
              Cristian Vasile

              Jessica,

               

              The formula provided by Richard is written to accept km as distance unit. If you need miles then replace 6371 (it is the mean Radius of Earth expressed in km)  with  3959 (miles) more facts about Earth here Earth Fact Sheet


              If you want to dig deeper i would recommend this resource Aviation Formulary V1.46


              Regards,

              Cristian.

              • 4. Re: density map & clustering
                Richard Leeke

                For some reason I just remembered an old thread where someone was having trouble with that distance function and it turned out to be because the JET database engine (text files, excel spreadsheets,MS Access) doesn't support the inverse trig functions (ACOS() etc). So that RAWSQL expression I quoted wouldn't work with JET.

                 

                The approach works if you're connected to a real database.

                 

                If you want to do it with your Excel datasource the only way I can think of would be to create a custom SQL connection with a cross product, create a data extract from that and then add the distance calculation as a calculated field after creating the xtract (because data extracts do support the ACOS() function).

                • 5. Re: density map & clustering
                  Jessica Lee

                  Richard Leeke

                   

                  Hi Richard,

                   

                  Thanks for the answer first.

                   

                  As you understood, I'd like to count the number of stores within the radius for 'each' store, and I am trying to run this on Aster database with millions of rows.

                   

                  1. Yes I tried table calucation distance method suggested by Matt Lutton But since I am running this with big data set, I can't create a parameter with location column. Tableau freezes every time I try to create a parameter.


                  2. Also, I still don't quite understand how to create radius parameter. Most of the examples you have already have radius already listed in the excel file. How should I set the radius on my own? I used Tableau around 6 month, so I am not really familiar with some of the functions. So, I might be missing something when I am creating radius. Before I move on, I'd like to understand this radius parameter you created.

                  • 6. Re: density map & clustering
                    Cristian Vasile

                    Jessica,

                     

                    Please, take a look at this excellent visualization (Walking distance to the nearest pub in Cambridge), could help you to understand the overall approach.

                     

                    Regards,

                    Cristian.

                    • 7. Re: density map & clustering
                      Richard Leeke

                      OK, I'll try to answer your questions - though I have some more questions for you, too.

                       

                      1. If you have millions of rows of stores it certainly won't work to try to create a parameter listing those. Even if Tableau didn't freeze when you try, expecting someone to scroll down through a list of a million rows is not going to be very workable.

                       

                      It sounds as if you really need to design the dashboard so that people can drill-down to the store that they want to focus on - but I really need to understand exactly what you are trying to do here to be able to suggest another approach.

                       

                      You mentioned that you have millions of rows. What are they? I know Starbucks is getting scarily close to achieving global domination - but I can't quite believe there are actually millions of Starbucks stores out there. I'm guessing maybe you are actually dealing with all sorts of retail outlets, so you may want to allow the user to filter by category, or retailer, or location (State/City) to get a more manageable list. That could all be done with a series of quick filters, or perhaps with a set of maps that allow you to drill-down geographically. Like I say, it depends what you are trying to achieve.

                       

                      2. In the example you posted you were filtering stores based on a range of distances from your chosen store. That approach works for selection of stores close to your one selected store, but it sounds as if you also want to define a distance for counting nearby stores for each of the selected ones. Is that right? If so, it sounds as if you are wanting to do two different things based on that radius parameter:

                       

                      a) Locate other stores within a certain distance of a single store.

                      b) Count how many stores there are within that radius (or perhaps a different radius) of each of the set of stores located in step a).

                       

                      Is that right?

                       

                      If so, it sounds doable but quite tricky - and I'm not quite sure how intuitive the resulting viz would be. I think you would end up having to use the sort of approach discussed in that long, convoluted thread I pointed you at before - using action filters to pass the set of selected stores to a data source consisting of a cross-product of your main table with itself. With the sort of volume you are talking about you might also find that you need a spatial index - though I have never used an Aster database, so I don't know if that is supported.

                       

                      But I think before diving into all of that complexity it's probably better to come back to the question of what are you actually trying to achieve and see if there might be a better/easier way of achieving the outcome you are after.

                      • 8. Re: Re: density map & clustering
                        Jessica Lee

                        I am trying to see number of tickets generated by each store. I could generate maps that is showing the stores with high volume of tickets and low volume of tickets, using size and color option from Marks box. But I want to do this more specific, like select the store and within the 5 mile radius calculate # of tickets within that radius.


                        So data will look similar to this


                        countryticket #store #postalcodelatlong
                        US173911136140.760859-73.766471
                        US273911136140.760859-73.766471
                        US372961136540.739403-73.785483
                        US4763511135640.783418-73.833288
                        US573911136140.760859-73.766471

                         

                        When I select store 7391, if there's no other store within 5 mile radius. I'd like to mark that area as there are only 3 tickets generated from that area.

                        Hope this clear some more things out.

                        • 9. Re: Re: density map & clustering
                          Richard Leeke

                          Raises more questions I'm afraid.

                           

                          1) What's a ticket?

                          2) Probably a bit of an aside, but if tickets are "generated" - that implies to me that there's some sort of time dimension here - or will you just have a dataset representing the tickets for some given period?

                          3) You say "if there's no other store within 5 mile radius" you like to mark that area "as there are only 3 tickets generated from that area". So what happens if there are more stores within the 5 mile radius? And does it matter how many? And what if there were more than 3 tickets generated from that area? How many more? etc, etc.

                           

                          I'm sorry, but I really don't have any idea what you are trying to achieve here.

                          • 10. Re: density map & clustering
                            Cristian Vasile

                            Richard,

                             

                            She wants to do the following tasks:

                            1. choose one store, and mark it as origin.

                            2. choose a radius (expressed in miles) in this case 5mi

                            3. count  #of tickets (maybe orders), generated by all stores located in that circular area,  including the store marked as origin.

                             

                            Regards,

                            Cristian.

                            • 11. Re: density map & clustering
                              Richard Leeke

                              Yeah, but there are strong hints of other requirements in there (if no other store, as there are only 3 tickets...) and I just want to understand the end goal rather than get it drip-fed.

                              • 12. Re: density map & clustering
                                Jessica Lee

                                Sorry Richard, I thought thread was posted but it didn't.

                                 

                                As Christian explained those are the only thing that I want to do.

                                and that will be the my end goal here.