1 2 Previous Next 21 Replies Latest reply on Nov 20, 2013 11:34 AM by Jim Wahl

    Data Blending - no date dimension in primary table

    Richard Stebles

      Hi all, thanks in advance for your help

       

      Setup:

       

      I have a central table which is a huge list of stores with a unique ID number.  Then I have three other tables which are the visit history of three different teams into these stores.  So I have set up a blend on the unique ID number

       

      Problem:

       

      What I would like to do (for starters) is create a view that shows number of visits per month from the three different teams into each store.  But my central store list doesn't have "Date" as a dimension, so there is no convenient blue pill to drag to the column shelf.  Is there any way to create a fake date dimension that will just pick up all instances of the store unique ID across all the dates in all three tables?

       

      I've attached a workbook illustrating the problem.

       

      Thanks in advance

       

      Regards

      Richard

        • 1. Re: Data Blending - no date dimension in primary table
          Jim Wahl

          Hi Richard,

           

          You have a couple of options here, depending on your data and your goal.

           

          The simplest is just to change the primary data source to your Merch data. Use the date field from this data source and on the pill, select show missing values so that you have a full set of discrete dates.

           

          This is what I did to create v1 of the attached.

           

          The other option is to add a min and max date to your primary data source. You could add "date rows" with a union statement in your database connection and then convert this to a date in tableau using a calculated field:

          Date =

          IF "Retailer Group" == "Date_start" THEN DATE("1/1/2012")

          ELSEIF  "Retailer Group" == "Date_end" THEN DATE("5/1/2013")

          END

           

          This gets a little tricky, as it uses Tableau's data padding / domain padding, which is not 100% clear to me. The stacked numbers and stacked bar chart views work fine, but the sum of all values for all three data sources does not work. I believe this is because aggregates and row-level calculations are done prior to the date padding. If a data source does not have a value for the date, no value is returned --- not even a null that can be converted to 0 using ZN() --- and therefore the calculation fails.

           

          What's not quite clear to me is why the calculation works when domain padding is used from the Merch data set as the primary data source. I'd have to play around with it some more or perhaps Jonathan Drummey can chime in.

           

          Finally a third approach might be to build a "scaffold" worksheet in excel with all of the dates. I didn't do this here and you'd have to think about maintainability and aggregating at different date levels.

           

          So if you reliably have a data set, like Merch, perhaps V1 is the way to go.

          Or if you just want a stacked bar chart view of visits, then either approach will work.

          Or, lastly, if neither of the above apply, perhaps scaffolding is the way to go.


          Jim 

          1 of 1 people found this helpful
          • 2. Re: Data Blending - no date dimension in primary table
            Jim Wahl

            Another approach is to combine the three store visit tables into one using a UNION. I guess I assumed this wasn't possible, because the tables were coming from a different databases.

             

            But if all of the tables are in one database, then I think this is going to be the best way to go, and give you more flexibility on formatting using a "visit type" column you'll add to the UNION'd data source.

             

            See Analyzing Related Data on Multiple Excel Tabs | Tableau Software

            • 3. Re: Data Blending - no date dimension in primary table
              Richard Stebles

              Hi Jim

               

              Thanks for your response.  I actually got diverted onto (distracted by?) a different project so I haven't had a chance to mull over what you've said, but a quick glance suggests that the answer is in there somewhere!  I'll come back to mark this as answered (or with further questions!) as soon as I'm back on the original project, but first wanted just to say thanks for taking the time out to respond

               

              Regards

              Richard

              • 4. Re: Data Blending - no date dimension in primary table
                Jonathan Drummey

                Hi Jim,

                 

                [This is all about the v1 worbook, I'll also write a response on version 2:]

                 

                Domain padding doesn't have anything to do with this as far as I can tell, the Merch data has no gaps in MONTH(Date).

                 

                The reason why the Sum Numbers worksheet wasn't showing anything had me stumped for a little bit, then I figured out what was going on - auto-updates were turned off for this worksheet only. Doh!

                 

                The Sum of Visits calc should work fine (it doesn't return any Nulls in the Sum Numbers worksheet) because of an interesting aspect of the order of operations as it relates to data blending. When we have a regular aggregate like SUM(Number of Records) being used directly from the data source (whether primary or secondary) and there isn't a value for the particular combination of dimensions, Tableau returns no value, not even Null, and so ZN(SUM(Number of Records)) won't work. However, when we're using the aggregate as a calculated field in a blend, like the Sum of Visits calc, then Tableau has to "manifest" the aggregate from the original data source inside the data source the calculated field lives in. When it does so, otherwise empty values become Null, at which point the ZN(SUM(number of records from another data source)) returns 0 and works fine as you add the values together.

                 

                Note that each separate field from the other data source's needs to be wrapped in a ZN(), so this (the original Sum of Visits calc) works:

                ZN(SUM([Demo (Demo Team.xlsx)].[Number of Records])) +

                ZN(SUM([Merch (Merch Team.xlsx)].[Number of Records])) +

                ZN(SUM([Calls (Training Team.xlsx)].[Number of Records]))

                 

                But this returns 0 for almost every row because one or more of these aggregates has no value for a given Month/Store GUID/Retailer Group, so the the SUM() of that is Null in the blend and the inner Sums return a Null to the ZN().

                ZN(SUM([Demo (Demo Team.xlsx)].[Number of Records]) +

                SUM([Merch (Merch Team.xlsx)].[Number of Records]) +

                SUM([Calls (Training Team.xlsx)].[Number of Records]))

                 

                Also, a workaround to ensure that any aggregate (no matter the source) returns a non-Null value is to do something like ZN(LOOKUP(SUM([Number of Records]),0)), or IFNULL(LOOKUP(MIN([some string]),0),"no longer a Null value") for a string value. As a table calc, the LOOKUP() returns Null if there is no value and then the ZN will work. This isn't needed in this case because the blended calculated field takes care of that.

                 

                Jonathan

                • 5. Re: Data Blending - no date dimension in primary table
                  AJ Fahmy

                  Hello Jon,

                   

                  I’ve been receiving these emails from you, they seems a fix to some ppl issues, please take me from your mailing list.

                   

                   

                   

                  Thanks,

                   

                  AJ Fahmy

                  BI Developer

                  Global Data Solutions

                  Elhamy.fahmy@mediacom.com<mailto:Elhamy.fahmy@mediacom.com>

                  T: 734-677-8017

                  2301 Platt Road, Suite 400 Ann Arbor MI 48104 USA

                   

                  www.mediacom.com<http://www.mediacom.com/>

                  • 6. Re: Data Blending - no date dimension in primary table
                    Jonathan Drummey

                    Hi AJ,

                     

                    I can see that you are following me in the forums. You must have some notification set up there to receive emails for every post I make, I'm not sure what you'd do to turn it off. The specified item was not found. or Tracy Rodgers should be able to help.

                     

                    Jonathan


                    • 7. Re: Re: Data Blending - no date dimension in primary table
                      Jonathan Drummey

                      Ok, now for v2.

                       

                      What Jim set up is a really cool application of both domain padding (what we get when we turn on Show Missing Values) and domain completion (how Tableau creates cells/addresses for all combinations of discretes on Rows, Columns, and Pages) can generate similar results.

                       

                      When placed on Columns with domain padding on, given this pill arrangement the Date field also triggers domain completion, so the dates are padded out for all Retailer Group/Store GUID's. The reason why we see the extra Null column is that the original Date field is returning Null for certain Retailer Groups, so Tableau is padding that out as well. Domain completion is totally dependent on pill arrangement, though, so if the Date pill moves to Rows then both domain padding and domain completion goes away (see domain completion off). We can bring the domain padding back into place using a table calc such as INDEX() with a compute using of the Date (see domain padding on) and in that case it pads all the dates out.

                       

                      However, for the padded data there's a different interaction with the blended data that I hadn't known before. I brought over the working Sum of Visits calc from v1 and that one fails, I'm not totally sure why, there are still subtleties around blending and domain padding that I haven't figured out. I think it's because the "all in one" nature of the Sum of Visits calc from v1 doesn't fully manifest the SUM's from the other data sources to wrap them in a ZN. However, there's a simple solution, which is to create three separate calcs, each the SUM from the other data source, then to create a fourth calc with the formula ZN([Calls # Records]) + ZN([Demo # Records]) + ZN([Merch # Records]). I set that up in the attached, see working sum worksheet, and as far as I can tell that returns the same results as v1.

                       

                      Jonathan

                      1 of 1 people found this helpful
                      • 8. Re: Data Blending - no date dimension in primary table
                        Tracy Rodgers

                        Hi AJ,

                         

                        To stop getting email notifications for whenever Jonathan makes a post, click on his name so that you are taken to his profile. In the upper right corner, it will say "Following in Stream." Click on that and un-check the Inbox option.

                         

                        Hope this helps!

                         

                        -Tracy

                        • 9. Re: Data Blending - no date dimension in primary table
                          AJ Fahmy

                          I did that already, thank you. I totally forgot that I’m following him.

                           

                           

                           

                          Thanks,

                           

                          AJ Fahmy

                          BI Developer

                          Global Data Solutions

                          Elhamy.fahmy@mediacom.com<mailto:Elhamy.fahmy@mediacom.com>

                          T: 734-677-8017

                          2301 Platt Road, Suite 400 Ann Arbor MI 48104 USA

                           

                          www.mediacom.com<http://www.mediacom.com/>

                          • 10. Re: Re: Data Blending - no date dimension in primary table
                            Jim Wahl

                            Hi Jonathan,

                             

                            Thanks for investigating this. Your replies are interesting and helpful. I'm still trying to get my head around this, but a couple of follow-ups on V1 above.

                            The reason why the Sum Numbers worksheet wasn't showing anything had me stumped for a little bit, then I figured out what was going on - auto-updates were turned off for this worksheet only. Doh!

                            Sorry about that! I thought the version I uploaded had auto-updates enabled.

                             

                            Domain padding doesn't have anything to do with this as far as I can tell, the Merch data has no gaps in MONTH(Date).

                            The Sum of Visits calc should work fine (it doesn't return any Nulls in the Sum Numbers worksheet)

                            What's still a bit confusing is that if I uncheck "Show missing values" in the Sum Numbers worksheet, nulls will appear if

                            1) Merch (primary data source) has a non-null number of records for the month / Group / GUID combination, but

                            2) another data source has a null value for that combination.

                             

                            Example 1: All three data sources cover April 2013 and GUID C - 7F093... has the following counts

                            April 2013: Merch: 1; Calls: Null; Demo: 9

                            With show missing values checked, I get 10; otherwise it's null.

                             

                            If I use ZN(LOOKUP(SUM([Demo (Demo Team.xlsx)].[Number of Records]),0)) + ..., I get the same result as if show missing values is checked.

                             

                            Example 2: Only Merch (primary data source) has July 2012 and with the same GUID C - 7F093, I get the correct sum.

                             

                            Example 1b: If I check show missing values, but put a filter on GUID, the aggregation fails, even with the ZN(LOOKUP(... based aggregation---same behavior as in Example 1, show missing values unchecked.

                             

                            I had "check missing values" tied to "domain padding" but as you pointed out, that's not what's happening here, since the Merch data source has a complete set of dates. Perhaps it's actually causing "domain manifestation" .

                             

                            Thanks again for your analysis. I'm running out of time today, but I want to study your comments on V2 some more tomorrow.

                             

                            Jim

                            • 11. Re: Re: Data Blending - no date dimension in primary table
                              Jonathan Drummey

                              Hi Jim,

                               

                              I don't have time to respond today, I'll have a chance tonight or tomorrow. Joe Mako also took a look at this and found an apparent bug in Tableau that I need to confirm and send off to support.

                              • 12. Re: Data Blending - no date dimension in primary table
                                Richard Stebles

                                Hi Jim

                                 

                                So I found myself back in this thing again.  Can you explain this bit to me please:

                                 

                                "The other option is to add a min and max date to your primary data source. You could add "date rows" with a union statement in your database connection and then convert this to a date in tableau using a calculated field:

                                Date =

                                IF "Retailer Group" == "Date_start" THEN DATE("1/1/2012")

                                ELSEIF  "Retailer Group" == "Date_end" THEN DATE("5/1/2013")

                                END"

                                 

                                I looked at the Date calculation in the workbook and it looks like this:

                                 

                                IF [Retailer Group] == "Group A" THEN DATE('1/1/2012')

                                ELSEIF [Retailer Group] == "Group C" THEN DATE('5/1/2013')

                                END

                                 

                                It definitely seems to work but I can't really work out what's going on here - why are the two formulas different?  I need to replicate this trick in a new scenario and I don't get how this is working so I am stumped again

                                 

                                Thanks in advance, also thanks to Jonathan Drummey for your attention on the matter as well

                                 

                                Regards

                                Richard

                                • 13. Re: Re: Data Blending - no date dimension in primary table
                                  Jim Wahl

                                  Hi Richard,

                                   

                                  The goal with the V2 approach is to create a data source (or edit your primary data source above) to include a start date and end date that covers the period you're interested in. Then you can add this pill to the view, right-click on it to Show missing values, and Tableau will fill in all of the missing dates---this is called domain padding or data densification. Once you show the missing values you have a scaffold where data can be added from your other data sources.

                                   

                                  Often the scaffold is just a simple Excel file with, for example, one column Record and two rows 0 and 1. Once connected to the data source, you can create a calculated field Date =

                                  CASE Record

                                  WHEN 0 THEN [Start Date]

                                  WHEN 1 THEN [End Date]

                                  END

                                   

                                  In your example, this doesn't work, because your primary data source needs to be Master Store List, because you want to list all the store IDs. In a blend, secondary data sources are aggregated and the StoreID dimension would be aggregated to "*".

                                   

                                  So you need to create a Date field in your Master Data Source, which is why I used

                                  IF [Retailer Group] == "Group A" THEN DATE('1/1/2012')

                                  ELSEIF [Retailer Group] == "Group C" THEN DATE('5/1/2013')

                                  END

                                   

                                  But, as Jonathan pointed out, there are other values than "Group A" and "Group C" and therefore NULLs are created, which complicates the padding. It would be better to use custom SQL to connect to this data source and add a UNION ALL statement at the end to add one row where [Retailer Group] = "Date_start". Then you could create a calculated Date field in Master Store list along the lines of

                                  IF "Retailer Group" == "Date_start" THEN DATE("1/1/2012")

                                  ELSE DATE("5/1/2013")

                                  END

                                   

                                  You could replace the hard-coded dates with parameters or with a nested MIN() / MAX() of the dates in the three visit data sources.

                                   

                                  Now you can add date field to the view and select show missing values and hide the Date_start Retailer Group.

                                   

                                  Alternatively, if Group A will always be in the data, you could use that for the min date.

                                   

                                  If you go this route, another important point (that Jonathan made above) is to create separate calcs for the sum of visits for each data source (Calls, Merch, Demo) rather than one calc with three sum statements. And then sum these three calcs in another table calc. This should make the blend more robust against null values. For example

                                   

                                  FINALLY, I still think the best approach (V3 above) is to combine the multiple visit data tables into one table / view. This should be more robust and also make it easier to filter out visits by type.

                                   

                                  Hope the above clarifies rather than confuses. ...

                                   

                                  Jim

                                  • 14. Re: Data Blending - no date dimension in primary table
                                    Richard Stebles

                                    Hi Jim

                                     

                                    Firstly, thanks for taking the time out to explain again your solutions, it did indeed clarify things.  I very much appreciate it

                                     

                                    In the end I just made another table with all the visits in one, which was your final point.  As you said, it simply is the most robust way of doing it, despite the irritation of needing to yet more data preparation work Excel-side (there is a lot more of this than the Tableau sales pitch would suggest...).  I was trying to avoid this work because I'm building more of a weekly report here than a single one-off analysis and Excel pre-processing adds time to the process, but c'est la vie :-)

                                     

                                    To me, the other approaches are too-advanced trickery to bend Tableau to behave in a way it currently isn't programmed to do: the solutions are impressive but not for me.  If you are interested, I added an idea based on this discussion (http://community.tableau.com/ideas/2443) and there's a similar, somewhat broader idea from Jonathan Drummney (http://community.tableau.com/ideas/1796)

                                     

                                    Thanks again to all involved in the discussion

                                     

                                    Regards

                                    Richard

                                    1 2 Previous Next