7 Replies Latest reply on Sep 1, 2017 2:49 AM by cyu cc

How to calculate EXPECTED vs ACTUAL time-series data

Ok, folks, I'm looking to take my workbook to the next level. I am analyzing visitor data for Utah's Big 5 national parks. I learned the hard way I had to import each of the parks data individually in order to pivot the data without it being aggregated together. Once I imported each data set separately, I then blended the data within Tableau to aggregate them together. I created three charts published here:

See published Tableau workbook here

https://public.tableau.com/profile/kevin.w.jackson#!/vizhome/UtahBig5AttendanceTrends/Big5CampaignResults

Tab 1: Simple attendance charts to illustrate relative attendance sizes of the 5 parks

Tab 2: Aggregated attendance trend over the past 30 years with a median attendance and trend lines

Tab 3: Aggregated attendance trend over the past 10 years with a line denoting BEFORE and AFTER the beginning of Utah's "Big 5" promotion (television commercials and other targeted ads) causing attendance to significantly increase. The trend line really illustrates this increase nicely!

NEXT STEPS

I want to learn what the effect of the "Utah's Big 5" promotion had on visitor attendance (beginning March 2013). To do this, I want to compare what the EXPECTED organic attendance growth would have been since March 2013 (based on the previous 7 years attendance) compared to the ACTUAL attendance growth since March 2013. The difference between these two variables, of course, could be directly attributed to the "Big 5" marketing campaign.

I have already charted ACTUAL attendance growth in these parks, so the real question is how to graph EXPECTED organic growth from March 2013 to present (based on the attendance data from January 2007-March 2013)? I know what I want to do, but I have no idea how to do this? I believe this would require a calculated field, but I'm not advanced enough to know how to do more than simple calculated expressions.

If I'm not clear or explaining myself correctly, please reply with any questions. I have also attached my workbook as well. Thanks in advance from your humble Tableau analyst in training!

• 1. Re: How to calculate EXPECTED vs ACTUAL time-series data

Hi Kevin

I am very interested in this topic myself! I too am a new Tableau user, so hopefully I can figure this out with you!

You could use the Forecast feature to estimate the Expected number of visitors using data upto the campaign launch date. Your data shows a lot of seasonality during the year, therefore using the right model is important. Once you have the right model, you can also add the 95% prediction intervals to the forecast. Then you should compare the actual value - if it is outside the upper prediction interval, then you may conclude that the campaign had an effect.

I was able to get the Forecast to work by filtering your data upto March 2013, then projecting out in future. I can't figure out how to overlay the actual numbers after March 2013 on the Forecast to see if the are outside the prediction interval.

If there was a way to store the Forecast values as a Measure, then we could also calculate the difference from actual.

Anyway, hopefully someone else will have an answer - in the meantime, see below a screenshot - the forecast values are shown in green.

• 2. Re: How to calculate EXPECTED vs ACTUAL time-series data

Naveen, you understand EXACTLY what I wanted to do. I can filter the data to March 2013 easily enough, but "Forecast" is greyed out in my worksheet. Did you have to do anything else in order to get the Forecast to work?

Lastly, perhaps there's a way to export the forecast data and reimport it as another data source.

Thanks for the feedback. This thread is on the right track. Best, Kevin

• 3. Re: How to calculate EXPECTED vs ACTUAL time-series data

OK, I had to turn on "Aggregate Measures" in order to get Forecast to work. Now to the next step.

• 4. Re: How to calculate EXPECTED vs ACTUAL time-series data

OK, I think I have accomplished my goal, but I still don't know if it is the most elegant solution.

Per Naveen's reply, I filtered my data to the pre-"Big 5 Campaign" date of March 2013 and forecast three years into the future to equal my observed data to early 2016. Unfortunately, I was unable to find a way to turn this Forecast into a measure. Therefore I copied the resulting Forecast data to a cross-tab and exported the data to a spreadsheet. In Excel, I formatted the Forecast data as my original data and added it as an additional data source to my Tableau workbook. Now with this additional data I was able to chart the Forecast data along with the Actual data to create the following chart. The difference between the Forecast data in BLUE and the Actual data in RED illustrates the positive impact of the Big 5 advertising campaign on park visitor attendance. That's exactly what I was trying to illustrate.

Next I wanted to learn what the calculated difference between the Actual vs Forecast visitor attendance was, therefore I created a calculated field that simply subtracted the Forecast data from the Actual data. I then created a bar chart as follows.

Interestingly, there was a real drop in the Forecast vs Actual data in the early months and likely before the campaign results really kicked in. Never the less, the difference after November 2013 is almost entirely positive. I would like to study the originating data to learn if there is a reason for the initial drop in forecast/expected visitor attendance between March-November 2013, but it may simply have been an anomaly as park attendance varies over time anyway.

Again, I don't know if this was the most elegant way to do this, so I'm open to your comments and suggestions.

Thanks! Your humble Tableau analyst in training!

I have published the results here.

• 5. Re: How to calculate EXPECTED vs ACTUAL time-series data

OK, folks! I figured it out. I know WHY Utah's Big 5 parks attendance took such a huge and abnormal dive in late 2013. The data simply does not lie! I did a search on September/October Utah news and learned that when the federal government shut down that fall, all of Utah's national parks closed including the "Big 5" parks. Per the article at -  Utah’s top stories of 2013 | Deseret News

The federal government partially shut down in October during a 16-day congressional stalemate, the result of the inability to pass a budget... The shutdown included closure of national parks and monuments in Utah... Economic repercussions included dampened consumer attitudes. Tourism in the communities near the national parks in Utah suffered an estimated \$30 million loss. Utah allocated \$1.7 million near the end of the shutdown to open parks — money that officials hope will be reimbursed by the federal government.

ured it out.

• 6. Re: How to calculate EXPECTED vs ACTUAL time-series data

Awesome!

Yes, I too was thinking about exporting the Forecast values in Excel, then creating a new Measure by bringing it back in your workbook. Not an elegant solution at all.

I think you should compare the actual data, not against the forecast mean value, but against the 95% prediction. If actual data (although higher than the Forecast mean value) is still below the 95% upper prediction, it could be purely by chance. Depends on how "sure" you want to be in your assessment - you could also compare against a 90% prediction interval.

I know there are other more powerful ways to develop a forecast - SAS for example. That way, you can bring in the Forecast values into the Tableau workbook right from the start. I am still hopeful some of the seasoned folks here can offer their advice.

• 7. Re: How to calculate EXPECTED vs ACTUAL time-series data

Hi Kevin, I'm happy to see your post, althought it has been more than one year since your origional post.

I have a question. As you mentioned, you saved the forcasting data as new data source, and then display it in the same chart with "Actual data". In this solution, all the forcasting data is based on the actual data before 2013/3/1, but I think only forcasting data of 2013/3 should be so. The forcasting data of 2013/4 should be based on the actual data before 2013/3/1 and ACTUAL data of 2013/3, and the forcasting data of 2013/5 should be based on the actual data before 2013/3/1 and ACTUAL data of 2013/3 and ACTUAL data of 2013/4, ...

I think you can understand what I said. Do you have any ideas for this?