Resource in Demand Source may be included in Capacity Source, but RR1 is not Resource 1.
As this graph is not dependent on resource, we have not consider the resource column for blending..
This graph should be overall demand requested and capacity used over the months.
Ok, that makes the workbook slightly simpler, thanks! I'll get an updated workbook and explanation completed as soon as I can.
Thank you Jonathan ....
Here's a review of the conditions and requirements as I understand them:
- The original data data sources are coming from Salesforce and can't be modified except through calculated fields since they are direct connections from Tableau.
- The Demand data has a record for each customer (?) with a start date and end date.
- The Total Requested Hours measure in the Demand data needs to be evenly spread out across all months for each customer.
- The Capacity data has a record for each customer for each month that the customer has results.
- The customers in the Demand & Capacity sets aren't necessarily the same, and the time periods covered aren't necessarily the same. So the data is sparse.
- The goal is to end up with a view showing each month with a bar for the sum of all the Capacity for that month and a line showing the sum of all the (spread out) Demand for each month. So there are at most 2 marks for each month.
- The Capacity for each month is quite straightforward, just sum up the Capacity for the month across all customers.
- The Demand data requires some sort of padding out of the data in order to parcel out the data.
- Since we can't modify the original data *and* the data is sparse, I strongly suggest using a scaffold source to be sure there are enough months in the data. (If the data was not sparse, there might be a way to use Tableau's new pivot functionality and Show Missing Values).
- Given that a scaffold source is necessary and we can't directly connect that "table" to the Salesforce data in SQL or some ETL process that means we'll be using Tableau data blending. While data blending behaves like a left-join in some respects, it's not in the sense that secondary sources cannot increase the granularity of the primary source. Therefore the scaffold source requires all the dimensionality necessary to perform the computations, and that leads to requiring every customer in the scaffold so the Total Requested Hours for each customer can be parceled out as desired.
- So the scaffold source needs a record for every customer *and* every month in either the Demand data or Capacity data.
I built the scaffold source by using the list of customers from the Demand data in one worksheet, a list of months in another worksheet, and then used the Legacy Connector to write the following Custom SQL query to generate a cross product of all requests & months:
SELECT [Customers$].[Requests] AS [Requests], [Months$].[End Date] AS [End Date], [Months$].[Start Date] AS [Start Date] FROM [Customers$], [Months$]
This would be the biggest possible scaffold, if you know start and end dates for the Demand & Capacity sources in advance then it would be possible to add clauses to reduce the size of this source.
To calculate the Demand, I did the following:
1) In the Demand source, calculated the # of Months with the following dimension:
DATEDIFF('month',[Start Date],[End Date])+1
2) Then created a Raw Demand measure with the following calculation:
[Total Requested Hours]/[# of Months]
This is available as a record level result for each requesting customer:
Now to integrate the blend to pad out the Raw Demand results across each month. In the primary source, I built the following Demand calculation:
IF DATETRUNC('month',ATTR([End Date])) >= ATTR([Demand Data (Demand)].[Start Date]) AND ATTR([End Date]) <= ATTR([Demand Data (Demand)].[End Date]) THEN ATTR([Demand Data (Demand)].[Raw Demand]) END
Because this calc is working across a blend that means all the terms have to be aggregated, so to get the aggregations right both the Start Date and Requests need to be in the view as dimensions. However the blend is only on the Requests as a linking dimension, *not* the Start Date or End Date. Here's the result:
This view is returning a value for each request & month the request occurs in, when ultimately we want a single value for each month. To get that aggregation across all the requests for each month, I turned to table calculations. Here's the Demand TC formula:
IF FIRST()==0 THEN WINDOW_SUM([Demand]) END
This has a Compute using on the Requests so it sums up across all the requests and returns only a single non-Null result to the first address in the partition:
So that gets us the Demand. Now for the Capacity. I created a Capacity measure with the following formula:
The reason why it's wrapped in FLOAT() is to ensure that both the measures have the same data type, if they don't Tableau won't let us synchronize the axes. Since all we want is a total capacity, we can bring Capacity in as a measure, with just Start Date as a linking dimension:
This is also returning a mark for each Resource. The reason why is that data blending aggregates the secondary source at the level of detail of the linking dimensions (so summing up Capacity for each Start Date) and then there is a second aggregation at the level of detail of the view, so it's duplicating the result for each Resource.
The solution to dealing with this is another table calculation, here's Capacity TC:
IF FIRST()==0 THEN WINDOW_MIN(SUM([Capacity (Capacity)].[Capacity])) END
This also has a Compute using on the Resource so it returns a single non-Null value for each month. This uses WINDOW_MIN so it will return the exact value and not sum up capacity across all the marks.
One more calculation is necessary to get to the final view, and that's FIRST()==0 with a Compute using on the Resource. This is used to filter out the marks (and get labels to work right). Some people won't do the IF FIRST()==0 on the other measures and just use the FIRST()==0 on this measure, personally I prefer to do both because a) it guarantees that each measure is only returning a single result in each partition and b) can be faster in some cases.
Here's the final Workout view:
With this view in place, then I duplicated the worksheet and moved pills around, turned off the tooltip for Requests (since the dimension is only needed to get the calcs right, not for display purposes), set the Mark type & Color, synchronized the axes, and moved the First Filter calc to the Filters Shelf and set it to Filter for True. This has two effects on the view: First, it stops Tableau from having to do any computation with regards to the Null values being returned for the padded out data. This can result in significant performance improvements in cases where there are thousands or more of Null marks. Second, it enables Tableau's Bar Marks to have their labels on top. (Without this we'd need to do a workaround using an invisible reference line). Here's the view:
From here you could hide the right-hand axis, change the axis legend, edit the aliases of the measures, change the date format, etc. to get your desired view.
A packaged workbook is attached, along with the original scaffold source so you can see what I did. I hope this explains what is going on, if you have any questions let me know!
Thank you for this solution Jonathan.
I've been able to adapt the solution to work with data allocations, however I'm running into some trouble allowing for start and end dates that do not fall at the beginning and the end of the month.
Is there a way to take this into consideration in the example above?