OK, I've had a 5 minute look at your data and workbook and understand the question you are trying to answer. I'll have a think about options for how best to go about it and get back to you - maybe in a day or two.
One quick question. I'm not sure what you mean by "the segmentation attributes that have been applied to these household IDs". Do you mean that you have further classifying attributes (i.e. additional dimensions) for the households not included in the sample data which you want to be able to use to breakdown the results?
I took a crash course in SQL today, my brain hurts.
After loading SQL Server LocalDB with a new "circles" DB including 3 tables based on the original file and creating a custom SQL query for Tableau i have recreated the original file i sent through (see attached and SQL below).
To your question. the household ID is our primary key in the original (significantly large) database i wish to apply this too. Each household has segment, purchase history and other dimensions applied to them if they have been a customer of ours. Sadly not every household has a customer in it so some household ID's will have no dimension values and they have a null value where appropriate across all the segment, purchase history dimensions.
As a secondary note to this, the shop name (Id) lat and long is not currently in the main database where the household ID is. I will need to add/join this to the main database.
I will continue to review the post you suggested as i believe the answer lies somewhere in there, and its a good way to also learn something new.
I hope this makes sense...
'Circle' as [Type],
FROM circles_kfHH INNER JOIN
[circles_kfShop Circles] ON circles_kfHH.ID = [circles_kfShop Circles].ID INNER JOIN
[circles_kfShop Radius] ON circles_kfHH.ID = [circles_kfShop Radius].ID
'HH Location' as [Type],
[ID] As [ID],
[Latitude] as [Latitude],
[Longitude] as [Longitude],
null as [Degree],
null as [Radius]
from [dbo].[2903 GNAF]
p.s. dbo.2903 GNAF is the household table that includes all the household ID's starting with G
Location Catchment Test v2.twbx 122.1 KB
By using a SQL Server database as the data source you completely change the nature of the problem and simplify things enormously (in some ways!).
One big difference is that you no longer need to construct the boundaries of the circles from a set of points at the same radius and different angles around each of your centre locations, you can just use SQL Server spatial functions to construct the circles - specifically the ST_Buffer() function used in that thread and the blog posts I pointed you at.
I still don't have time to go into much detail right now, but off the top of my head, here's roughly what I think you will need to do.
1) Structure the data in your database in 3 tables - one each for household, shop and circle. I think the columns you need for each are as follows:
- position (as a SQL Server geography type, which you can create from the LAT/LON values you have)
- other dimensions
- position (again as a geography type)
Note that you don't have to generate all the circle boundary points like in that old thread.
2) Create a custom SQL connection to support the visualisation you want, using SQL Server spatial functions to work out which housholds lie within each radius of each of the shops.
There are a lot of ways you can attack this. Depending on what visualisation you want to end up with the complexity may vary quite a bit. For example, if you just want to produce a table or a bar chart showing the counts of households within each radius of each shop you just need a single SQL statement returning the counts by shop ID and radius. On the other hand if you want to show all households on a map along with the shop positions and the boundary circles around each shop you will need a more complex UNION query which actually returns the various geography objects.
3) You will probably also need to add some calculated fields - potentially RAWSQL_SPATIAL ones. There are multiple ways to attack this and again it depends what vizzes you want to end up with.
See what you can work out from reading Sarah's blog posts. If you get stuck I will try to find time to put an example together.
I had a good look through that post last night, and read up on some of the indexing, union methods and geography types in SQL server to understand how they work and are used as i was a little confused originally.
At the moment i have created and uploaded the new tables. I have successfully added a "shop_geography" column using the code suggested by Sarah. A quick question, as i don't have polygons but points with a lat and long assigned to upload i basically skipped the step of pushing shape files into the server, assuming this was a correct move?
Next i am trying to create the centroid column. I have successfully created it however the STCentroid() update is throwing an error:
set [centroid]= [shop_geography].STCentroid();
Msg 6506, Level 16, State 10, Line 3
Could not find method 'STCentroid' for type 'Microsoft.SqlServer.Types.SqlGeography' in assembly 'Microsoft.SqlServer.Types'
After research i think the problem is that the feature pack needs to be installed, i'm just trying to figure out which one, once i sort this i will push on.
re your point on the type of visualizations i need, my stakeholders and marketers, they want to see graphs and maps... we can get to that after i crash through these obstacles...
Thanks again for your time.
> as i was a little confused originally
I'm not surprised - there's a lot going on here. Hang in there!
Correct that you don't need to load shape files - your data just has the lat/lon coordinates of shops and households so you just need to create the point geography objects for those, as you have done for shops.
You don't need to create centroids. A centroid is just the "middle" of a polygon - which in Wilfred's case he needed to calculate so that he had a point to calculate the circular buffers around. You already have the shop locations as points, so you are just going to need the expression to create the circular buffers around those points. The only polygons you will be using will be the "buffer" circles. Those will be created with the ST_Buffer() function.
I thought you'd want to see maps - I was really just trying to make the point that the structure of your data source needs to be appropriate to support the visualisation you want.
Thanks, i'm finding this a really interesting challenge, loving all the new things i am learning....
i'm so happy you said i don't need a centroid as i had a suspicion that i didn't need it but following a process is important when you don't know whats going on under the hood.
So i think i have run into a user error i.e. i did something wrong. I am up to building the RAW/Custom SQL to bring the dynamic buffers to life.
Following is the process i used to get to where i am and screen shot of SSMS. When i bring the various dynamic filters over to the marks card nothing appears. I think it's because there is nothing in the newly created column as you can see it is presenting as Null.
Table and column creation:
RAWSQL_Spatial("%1.STBuffer(1000)", [Shop Geography])
Custom SQL Query:
select [shop_geography].STBuffer(<Parameters.Buffer distance (KM)>*1000) as dynamic_buffer
I realized that i was trying to shove a polygon coordinate into the column when i should be shoving a point coordinate in there...
set [shop_geography]=geography::STGeomFromText('POINT('+convert(varchar(20),[Longitude])+' '+convert(varchar(20),[Latitude])+')',4326)
Looks like you're making good progress.
If I have time this evening I might have a quick go at this myself. We can compare notes...
Richard, attached is the current version which works well so far.... THANK YOU!
I am now trying to work on the visualization so it presents well and doesn't confuse the daylights out of people:
- for example one site adjoins a national park that has no households in it, nothing but trees. The viz with HH sites against Shop 3 presented as a point only makes it look like it is not collecting half the area. I think i would like the buffer area diameter disc presented only, with an ability to choose which one to show through a filter. I am struggling with this a little as you will see from sheet 1 where the buffer area disc is being applied to the household and not the shop point. It calculates correctly however, stumped a little here.
- Also there are 256 households that fall in 2 zones. This is easy to explain visually in a Ven like diagram viz with the radius discs and with a disclaimer against tables etc, i am just trying to think of how/where to count the overlaped locations.
- This is a prototype, i added in 200k more address sites just to see how it goes with the viz (so good, soooo good), it takes 5 min to load after every click though. This will be applied to a DB with 12m+ address sites, i think i will be getting a call from our ever patient DB manager if i cant work that one out.
I look forward to seeing your interpretation and ideas...
not sure if i said this earlier, thanks....
Location Catchment Test v3.twbx 65.2 KB
I've had a bit of a play with your sample data and have got some of the way - but there's a way to go yet.
I've got a hint for you on something which may trip you up, a question and a peek at where I've got to so far.
First the hint. I initially used the expression you used for populating the shop and household geography columns:
set [shop_geography]=geography::STGeomFromText('POINT('+convert(varchar(20),[Longitude])+' '+convert(varchar(20),[Latitude])+')',4326)
But that resulted in the household positions being really distorted and aligning in a series of vertical strips. I remember hitting this in the past - that expression comes up a lot if you Google on how to populate a POINT geography, but it actually truncates the latitude and longitude values. I used this instead and it worked fine.
set [Location]=geography::STGeomFromText('POINT(' + CAST(CAST([Longitude] AS decimal(13, 8)) AS varchar) + ' ' + CAST(CAST([Latitude] AS decimal(13, 8)) AS varchar) + ')', 4326)
You can just run the UPDATE again with this syntax and it will fix up your data.
Now the question. How many shops and how many households do you plan to include in your analysis? I want to make sure I come up with an approach that is going to scale far enough to cope with your volumes.
Here's a look at where I've got to so far.
One thing this is highlights is that you'll need to think about how you want to handle households that fall within the circles for multiple shops. As I have it displayed above the households are coloured according to distance from shop 3, but there are actually points hidden underneath showing distances from shop 4 - which you can only see if you filter out shop 3 or highlight something in the legend:
I'm also thinking about the best way to display things like the counts by buffer for each shop. That probably needs to be a different sheet in a dashboard.
Let me know if you want to know how I've structured my data source - not that I'm convinced that I've got the best structure yet.
As you probably gathered I hadn't seen your last update when I posted my last one.
Your points all make sense, though I actually can't look at your viz because your workbook is trying to connect back to your SQL Server instance, which it of course can't do. Can you create an extract and recreate the packaged workbook? That will only include the data for the currently selected value of the radius parameter, but it will let me see how you have the vizzes structured and make it easier to comment on your points.
One question. Did you create a spatial index on the household geography column? If not, that should speed things up a lot.
As you'll see above, I took a slightly different approach and showed multiple circles - but I think that is likely to be too confusing, so the approach with a parameterised radius will probably work better.
I updated my query with the one you suggested. Your right i did use the query i originally used through my google searches, i also did a little reading on the Microsoft SQL Server product site to see if this was the best fit, i understand the differences between the two now and thanks for bringing that to my attention.
To answer your question on scale, there are currently 220 or so shops dotted around Australia which will form the core of the analysis with this analysis informing an expand/no change/contract decision. The DB we have created is based on preexisting address data our very smart boffins have been building which totals around 12m or so households in Australia, i would expect 60% of these would not be remotely close to a shop with the majority of the remainder most probably not within range either. I have set up some time with my data people tomorrow to understand rules, restrictions and gotchas.
I really like the view you included which highlights, in different colours, the locations at differing radius bands. in terms of overlap, it is inevitable that this will occur so i will disclaimer this at a macro level and as the analysis drills down to specific shops it will be easier to see where this occurs and tactical decisions can be made. i guess i need to be able to refine or add the detail from a single diameter disk only at a national level (think national map with discs all over the place with no household points) down to a single site with multiple embedded discs and household points like you have presented.
Apologies for not attaching the correct workbook.. now attached..i hope i have it right now and again thanks for bringing that to my attention.
Morning Richard, Hope your well,
After having a chat with our DB and Server manager people there is, as you would expect, a requirement to minimise the load as there is a significant volume of info and an equally significant demand on the resources.
So, the current status is that there is a preference for calcs to be done in Tableau, however, if i can create a dashboard/report shell with no data to start and then run an update once only to fill the report and then create an extract that appears to be viable. So i have been encouraged to continue researching down the server call path with a view to calling the server once only over night.
To that end i think your sheet 3 example above works well where we cumulatively count households at each new radius and then create all sorts of new dashboards and graphs based on the included locations by site. Overlap is a minimal issue for people so this falls down this list of important things.
Lastly there was allot of oohing and ahhing at your help as well as the the skill you demonstrate, hats were tipped in your direction.