I am also having the same issue as Emily with the Unique Visitors count.
For instance, my April 2013 numbers for Unique Visitors are as follows: Tableau = 12,098,633 vs. Google Analytics = 5,662,554, which is not a small difference.
Other measures such as Visits and Pageviews are identical.
I am also having this problem, Tableau is giving me the same number for Visits as it is for Unique Visitors for the month of May 2,746 while in GA it says that my Unique Visitors are 1,195. Any insight into this would be greatly appreciated. I would assume that Tableau should just grab the numbers directly from GA.
Emily: I've been working w/ G.A. data too and finding some similar funkiness. In short, I think it's mostly a limitation of Google Analytics b/c it provides aggregated data in different buckets that don't sum up to the whole.
First, Tableau will tell you if it pulled sampled data. You'll see a little grey box that gets in the way of your data in the lower right corner. If you're not seeing that warning you're not seeing sampled data.
Next, GA samples data when you reach a certain number of rows of data.
I've figured out that visits (aka sessions), pageviews, and some other measures don't seem to get sampled even if I go back to 3 years worth of daily data. However if I add unique visitors or some other dimensions the same data is returned sampled.
What I think is happening is that UnqVisitors are an aggregated measure in G.A. and that the aggregation returned via the tableau connector is a diff't aggregation than what GA shows. The UV count in G.A. depends on whether you're looking at daily UV's, weekly UV's, monthly U.V.'s or yearly UV.'s. I believe the Tableau connector only gives us daily U.V.'s, which means compared to how G.A. reports data Tableau will double and triple count people when we show weekly, monthly, quarterly and yearly aggregations.
And GA apparently stores the data in a cube and then the filters and other things we do in the interface are used to eliminate double counting in the data returned in the GA interface.
Say whaaaa? Ok Unq Visitors are an aggregated measure b/c if the same browser visits a site on Monday at 9a until 9:10a and returns on the same day at 3p until 3:05p, that counts as 2 visits and 1 unique visitors. Repeat that on Wed of the same week, and then on the next 3 wks of the month.
G.A. counts this as follows:
16 visits (2 visits per day. on 2 days. on each of 4 weeks. 2x2x4=16)
However introduce the concept of a unique visitor to G.A. and it counts U.V.'s as follows:
1 daily uv if you look at data on any one of the 4 mondays
1 daily uv if you look at data on any one of the 4 weds.
1 wkly uv shows for each of the 4 wks
1 monthly uv for the one month.
Does that help?
Oh and the issue gets further muddled if you pull in a dimension like new and returning visitors, or a dimension like source.
1 of 1 people found this helpful
I would agree with Bruce's post on unique visitor counts. I have been involved with web analytics, and managed a team of web analytics consultants, with various vendors for over 15 years now, so hopefully that lends some credibility to my post. The issue with GA is the same with all the analytics vendors... bucketing.
Bucketing counts a visitor in different time segments, such as days, weeks, months, quarters, years, all-known-time. Tableau is pulling data at the lowest bucket level (days) and then summing the visitors up as you request aggregation (e.g., weeks or months). So while I was a unique visitor to your site on Monday, and I was also a unique visitor to your site on Tuesday, if you add those up across the week you will have double-counted my activity and display a unique visitor count of two, when in fact it was just me.
The solution is de-duplication (a.k.a., deduping), or aggregating lower-level data at higher levels and removing duplicated unique visitors as you aggregate upwards. This is a very complex process and the source of a lot of pain for analytics vendors and analysts.
In Tableau's defense, they can't access the necessary data to de-duplicate visitors from Google Analytics. The data necessary to de-duplicate visitors has privacy concerns attached to it and can't be retrieved through the Google Analytics reporting API that Tableau uses. On the other hand, Tableau could query data at higher time buckets instead of always pulling the most detailed level, but that has issues as well (e.g., how would they query GA if you requested nine days of data, which would be one week bucket and four day buckets).
When dealing with web analytics data, you'll get the most accurate non-duplicated data from page views or events. Duplication issues will start to occur at the level of visits and are worst up at the level of visitors. The exact impact of duplication is different depending on a variety of factors and differs from website to website. Generally speaking, the more visitors you have, and the higher the "average visits per visitor" the more impact duplication will have on your unique visitor counts.
You can find a more detailed discussion, called "the hotel problem", of the visitor duplication issue here: http://en.wikipedia.org/wiki/Web_analytics#The_hotel_problem
Thanks for the detail and link to the wikipage. They helped clarify some things for me.
One question for you is that you say the bucket / over counting problem exists with visits as well as unique visitors. I'm not seeing that in my data. And conceptually I'm thinking that visits which are just sessions are not bucketed.
Am I missing something in my data or my way of conceptualizing visits (aka sessions)?
Visits are far less prone to over-counting. This really only happens for sites that have a high degree of international visitors, extremely long visits, or sites that have a large amount of overnight traffic (e.g., adult-oriented sites, video sites, etc.). The main issue is visits that span a "day" boundary. This varies by web analytics vendor.
The basic issue is visits that span a day and are automatically broken up into two visits. For example, if I log on your site at 11:50 PM local server time and then finish reading content at 12:15 PM some vendors will split the visit at midnight and report two visits. This is far less of a problem for most sites. This can cause a slight over-inflation of visit numbers, but the impact is usually fairly minimal, and some analysts will even ignore the problem.
As I said, there are a few situations where the impact may be higher: highly international sites, sites with extremely long visits (i.e., several hours long), or late night traffic sites. There's no exact rule on how to determine the impact as it varies. For example, some vendors just do a hard split at midnight, while others will only split the visit if it has been inactive for 30 minutes when midnight passes, while still others will split the visit after a configurable period of time regardless of whether there is continuing activity.
I dont know if your issue was resolved, but I am now running into the same issue. Its not over-counting, its under reporting by Tableau (like yours). The other posts were referring to Aggregating Uniques which are at daily granularity up (which we are not doing). I am pulling in uniques at daily granularity from GA and representing at daily granularity in Tableau.
Any other pointers or solutions?
Sorry, for not replying to your post Meg, but I just want to add something to the aggregation of UVs in Tableau.
Not sure if you guys solved the aggregation-problem of UVs, but if I work with aggregated UVs, I do the following:
- if the aggregation-level is a week, then I use only "Week in Date Range" as dimension
- if the aggregation-level is a month, then I use only "Month in Date Range" as dimension
- if I want to get the total amount of UVs for a certain date range, I select the range an use only "year" as dimension. Unfortunately I haven't figured out a more elegant way to deal with aggregating the total amount of UVs.
- if I work with UVs on a day-level, I just use "Date" as a dimension
I never use "Date" as dimension if I want to sum up UVs due to the reasons described above by Eric
Sounds interesting Daniel but how do you pull these specific time range dimensions?
As for me, I copy the unique visitors tally through a text object on my Tableau dashboard, everything else is dynamic.
I have the same issue with Tableau not matching numbers directly pulled from GA.
If we upgrade to GA premium, will the sampling problem go away when I use Tableau? or is it just when I create a report within GA that it will no longer be sampled data?
It's not that Tableau doesn't match G.A. It's that G.A. doesn't always let us automatically extract the exact same data as it displays and it doesn't tell us what it's passing through.
You will only be able to get aggregated measures like unique visits, unique pageviews, or dimension-measures like new visitors to reconcile to the day and not to higher aggregations like week, month or year. They will be higher than what G.A. shows in its U.I. And using aggregated measures can cause G.A. to return sampled data.
From what I can tell G.A. Paid does Display un-sampled data but that may not solve the problem you describe b/c I can't tell if G.A. Paid lets you Extract un-sampled data automatically.
You might be able to get your data to reconcile if you do the following;
1. Make sure both the data in the GA report and in your data pull is NOT sampled.
2. Make sure you compare the data you pull into Tableau to the GA profile and turn OFF any "advanced segments." G.A. will not let us apply "advanced segments" to pull data through the API (that includes the Tableau auto connection).
3. Don't use any aggregated measures (e.g. unique visits, unique pageviews, etc). Not only will they not reconcile beyond the daily aggregation but including them can cause G.A. to return sampled data.
4. Start by trying to reconcile numbers by daily aggregation, then work up to weekly or monthly aggregations.
Is there a way I can pull in historical data and save it as an extract, and then each day just pull in just one day of data and add to that extract, so I can get un-sampled data.
When I go to my GA data connection, I don't see an option to do incremental update.
I would love to know this also.
Ming & Tarik:
The short answer is that I'm not currently aware of any way to do what I think you're asking inside Tableau Perhaps some of the other users here can correct me if I'm wrong. There is a way to do it by pulling the data and storing it outside of Tableau.
I think you're asking can I set my data connection in Tableau to GA to a specific date range (say Jan 1 to Jan 31, 2013) and pull the extract. Then can I go back and refresh the tableau extract and ADD data from another date range (say Feb 1 to Feb 28, 2013).. And then can I repeat the process for every month there after?
When you change the date range in the data connection Tableau will create overwrite the existing data in the extract. So you can't add data to the extract incrementally this way. Note if you do an incremental refresh on the extract on any day after Jan 31, 2013, for the first data range - Jan 1 to 31, 2013 - there will be no incremental data to add.
Alternatively, you could use Tableau to "round trip" the data so it's collected via Tableau, stored outside of Tableau and then analyzed in Tableau via the external data store. Build a table of the data you want inside Tableau. Then set the connection to a date range, say Jan 1 to 31, 2013. Then bring the data into Tableau. Then export it to a .xls and add it to your external date store. Then change the data range in the data connection to say Feb 1 to 28, 2013. Rinse and repeat.
I hope this is helpful.