1 of 1 people found this helpful
It might be as simple as creating a new field: Count Calc =
IF [In Date] == #2013-05-01# OR [Out Date] == #2013-05-01# OR [Ship Date] == #2013-05-01 THEN 1 ELSE 0 END
This will be evaluated for each row and you can then add SUM(Count Calc) to your view.
#2013-05-01# is a Tableau date format. You'd probably replace this with a date type Parameter (when you enter the formula, select add Parameter), which is a user-selected constant you can use in calculated fields. To select a range, you could use two date parameters and use >= and <=.
I've attached an example using the generic Tableau Superstore Data.
- Generate a complete list of possible dates (I use some SQL code to generate 20 years of dates) - Base Table
- Create as many duplicate Data Sources as you want to count the dates of (I think you need 3 - my example shows 2)
- Set the Data Relationship between the Base Table and the first Data table Ship Dates (I set Year/Quarter/Month/Day)
- Set the Data Relationship between the Base Table and the second Data table Order Dates (I set Year/Quarter/Month/Day)
- Create chart using a count of the number of records (matches based on date relationship), set dual axis
TO_CHAR(cal_date, 'yyyy') AS YEAR,
TO_CHAR(cal_date, 'mm') AS MONTH,
TO_CHAR(cal_date, 'iw') AS week_of_year_ISO,
TO_CHAR(cal_date, 'ww') AS week_of_year,
TO_CHAR(cal_date, 'dd') AS day_of_month,
TO_CHAR(cal_date, 'd') AS day_of_week,
TO_CHAR(cal_date, 'ddd') AS day_of_year,
TO_CHAR(cal_date, 'hh24') AS hour_of_day
SELECT TO_DATE('2008-01-01', 'yyyy-mm-dd hh24') - 1/1 + ROWNUM/1 AS cal_date
WHERE TO_DATE('2000-01-01', 'yyyy-mm-dd hh24') - 1/1 + ROWNUM/1 < TO_DATE('2020-01-01', 'yyyy-mm-dd hh24')
-- just my way of generating dates ... you could build it in excel just as easily.
Count Order and Ship Dates.twbx.zip 814.5 KB
Another approach is to combine the separate date columns into a single date column with another column to show the date type. This can be done with a union statement and custom SQL.
Using Jason's example, the data looks like this
Row ID | Order ID | Order Date | Ship Date | additional cols | ... 1 | 3 | 2012-01-01 | 2012-01-20 2 | 5 | 2012-01-02 | 2012-01-15
You want this to look lie
Row ID | Order ID | Date | Date Type 1 | 3 | 2012-01-1 | Order 1 | 3 | 2012-01-20 | Ship 2 | 5 | 2012-01-02 | Order 2 | 5 | 2012-01-15 | Ship
Here is the custom SQL:
SELECT [Orders$].[Order ID] AS [Order ID], [Orders$].[Row ID] AS [Row ID], [Orders$].[Order Date] AS [Date], "Order" AS [Date Type] FROM [Orders$] UNION SELECT [Orders$].[Order ID] AS [Order ID], [Orders$].[Row ID] AS [Row ID], [Orders$].[Ship Date] AS [Date], "Ship" AS [Date Type] FROM [Orders$]
Your life will be a bit easier if you add all of the columns you want to filter on.
Then it's just a matter of dragging Date to the Columns shelf, Number of Records to the Rows shelf and Date Type to the color shelf.
I like Jim's method better than mine if all you want is to compare counts (you don't want to compare any other columns in the data across the sales/orders/etc dates).
Still good to know how to be able to do that. Thank you!
Thank you so much.
Just saw another trick today at Tableau conference that I think might work even better.
It's a combination of the two approaches above. Basically you create multiple connections to the data source, and change the relationship so that everything is blended on "Order Date", but Order Date is
Order Date in the first connection
Ship Date in the second,
Here's what I did to update the attached example:
1. Duplicate the data sources by right clicking the data source and selecting duplicate. If you're using a database connection, this creates another independent connection to the database.
If you're using extracts, this will duplicate the extract. This works, but a better option is to select Connect to data and select the same extract. In either case rename this extract with a name specific to the modified field (and to prevent confusion, since you'll be changing the names of fields in this data source). I called it Ship Date.
2. Select the primary data source Sample - Superstore Sales (Excel) and drag Order Date to the columns shelf. If you right-click when you drag it, you can also select the date aggregation type. I used Exact Date.
3. Drag Number of records to the Rows shelf.
4. Select the Ship Date data source. You'll see n orange chain link that means Order Date is linked to the primary data source. Right-click Order Date and Rename it to "Original Order Date". Right-click Ship Date and rename it to Order Date. Right-click gain on Order Date > Default Properties > Comment and add a note to yourself, "This is Ship Date."
5. Drag Number of Records from the secondary data source to the Rows Shelf. Since this is a green pill, it will create another axis. Click on the axis and drag it to the left-axis above to put both of these lines on the same graph:
Yo Jim keep those #TCC13 tips coming! Us forum folks far from the action want to know everything. Thanks,
I'm struggling to find the right words to describe the wrongness of this kind of self-blend. (I think I'm more bothered that someone from Tableau would be teaching it than anything else). This needs a big ole' caveat that it will *only* return accurate results if there the primary dimension is domain complete for the secondary dimension. In this case, that means there must be a value of Order Date for every value of Ship Date. In the case of Superstore Sales Extract, there are 41 Ship Dates that have no corresponding Order Date, Superstore Subset has 46.
Turning on Show Missing Values to pad out the data prior to blending is of limited use here because the padding will only go to the min/max of the primary (Order Date), thereby missing data that is outside of that boundary. The v8 Superstore Sales data has a max Order Date of 12/31/13, the max Ship Date is later into 2014 so those ship dates would not be included in padding.
Rather than depending on the primary data to have a set of values, there are accurate ways to build a scaffold source for a blend, like union'ing two select statements to get the min of order date/ship date and the max of each date, then use Show Missing Values for that. Or one could use the original solution you'd come up with, to transform the data.
Thanks for reading and catching the domain padding error above. There was a little voice in the back of my head saying "domain padding", but the I was a bit too tired to pay attention to it.
In fairness to Tableau, they did mention the need to turn on Show missing values. In fact they made it quite obvious with an example (shown below), which was a "Jedi Super Store" backlog analysis of orders entered (Order Date) minus orders shipped (Ship Date).
The backlog formula was
// Do RUNNING_SUM twice to avoid issue of nulls RUNNING_SUM(SUM([Number of Records])) - RUNNING_SUM(SUM([Ship Date (Jedi Superstore Sales (Excel).xls) (copy)].[Number of Records]))
When aggregating this on the Week level, there is at least one record in each data source for each week, and Show Missing Values wasn't necessary. Note how the backlog ends at 0 on the last day. (Nice to have a squeaky-clean Superstore data set. The first and last days for both Order Date and Ship date are the same, too. )
Then we switched to daily aggregation and because of the nulls, the backlog grows continuously---and it's obvious (at least once you've see it) that Show Missing Values needs to be enabled.
And with Show Missing Values enabled:
I've attached the workbook, if anyone is interested in the specifics (and the other Jedi training tricks).
You're right a bigger caveat maybe is necessary, and because of the required renaming (which I'm sure will confuse me latter), I'm not sure I'd add self blends to my regular toolkit.
Quick question on the scaffold approach with regards to filtering. It seems like there are two options:
- Have a min/max scaffold with dummy dates (for example, #1900-01-01#), and replace these with user-specified filter parameters. The issue is having good default values (ideally the current range).
- Use SQL to find the min and max values over all dates. The issue here is that when you use a date filter, you'll filter out the min / max values in the scaffold.
Am I understanding this correctly? Is there a workaround (other than building a scaffold with all of the possible dates) --- or is a union approach easier.
jimw_workbook.twbx.zip 1.6 MB
I'm going through some old emails this morning and found this unfinished thread, sorry for the delay!
With regard to your statment: "The first and last days for both Order Date and Ship date are the same, too." That's critical. The underlying data has to have enough detail so that the range of Order Date is larger than the range of Ship Date at whatever level of granularity is used in the particular view, which is probably not the case if we drilled down to a particular Department/Category/Ship Mode/etc.
If the users could filter on an arbitrary range of dates, the two options I can think of would be to use a scaffold that had fully padded the domain, or to use a table calc filter for the date.
Thanks for the followup Jonathan. I think I'm finally starting to get my head around domain padding.
Excellent! Can you explain it to me? Seriously, though, almost every time I think I've gotten my head around it then I learn something new that forces me to rethink what I thought I knew. Most recently was in the last week or two, I realized that there were a number of cases where I was assuming domain padding was happening when it was really domain completion that only looked like domain padding because the domain being completed was a bunch of dates that were non-sparse enough to look like they were domain padded.