8 Replies Latest reply on Nov 12, 2012 1:46 PM by Jonathan Drummey

    Data blending, missing months.

    Piotr Reszka

      Hello everyone!

      I would like to create visualisation of running salses vs running targets (for sales person). I have 2 data sources, ERP (MS SQL) with sales transactions and MS Excel with targets for every month. I have to use data source from ERP as my primary data source. There are months with no transactions. After creating report, running total for target is incorrect (there are missing months in primary data source). I tried to use “show missing values” on date dimension but there were still missing months before first and after last month with data sales and running total of target was still incorrect. It seems that data blending works like left join.

      Have you any ideas how to complement missing values?

      I attached twbx.

       

      Piotr

        • 1. Re: Data blending, missing months.
          Jonathan Drummey

          Hi Piotr,

           

          Data blends are a special kind of left join that only occurs after the data has been returned from your two datasources. Therefore, since there are no rows in the primary (transactions) source, Tableau has nothing to join on and there's no data to fill in. So you need to pad out the data yourself for the missing months. I run into this pretty regularly in my work, the solution that I've been using is the following:

           

          - created a list of months in the database

          - create a query that right-joins the months to the transactions datasource. This guarantees that there's at least one row for every month. I'll typically use a subquery or other criteria to limit the number of months to what is needed for the analysis.

          - use that query as the primary datasource in Tableau

           

          You'll need to be a bit careful about totals (since [Number of Records] in the primary data source now includes the padding rows), and some calculations may need to be wrapped in ZN() to deal with the Null values returned from the padding rows. Once you get those taken care of, the rest of the view comes together quite nicely.

           

          Cheers,

           

          Jonathan

          1 of 1 people found this helpful
          • 2. Re: Data blending, missing months.
            Piotr Reszka

            Hello Jonathan,

             

            Thank you for your suggestion, it seems that best solution is to prepare complete view in MS SQL (with all months) and use it in Tableau.

            However, It would be useful to specify type of join (left, right, inner) in the data blending.

             

            Best,

             

            Piotr

            • 3. Re: Data blending, missing months.
              Don Barnetson

              Hi Jonathan,

               

              This is just what I have been looking to do. Could you give an example using the Superstore data of how to accomplish this, or using the same example above, that is missing items in either the Category or even the Sub-Category instead of Month's or something at that level.

               

              Thanks,

              Don

              • 4. Re: Data blending, missing months.
                Jonathan Drummey

                Hi Don,

                 

                The level of effort required depends on what you're trying to accomplish. For example, if you are just aggregating over a measure and don't need to do any filtering, then you can set up a table of months, use that as the primary in a data blend, and start aggregating.

                 

                If you need to do filtering based on row-level details, then you'll need to set up the months table, then set up your join. I try to do most of this work in the data source, and that can require some messing around with the query to get the data you need. Depending on the need and data volumes, I might use a left join, a cross product, or a union query. I set up an example of a left join in the attached, you can see that the major differences are for the discrete bars and the Running Sum of Sales.

                 

                Jonathan

                • 5. Re: Data blending, missing months.
                  Don Barnetson

                  Thanks for the example Jonathan.

                   

                  The biggest issue that we are having is the shear size of the initial data. Currently, without the padding, we are dealing with 17+ million rows of data.

                   

                  Our goal is to pad the UPC level (i.e.; Product Detail or Sub-Category, etc.) filtered by the Store level (i.e.; Customer, etc.).

                   

                  Doing the method you've given and example of would work, but from our calculations would create a data base of almost 500+ million rows just to be able to show all UPC's for all Stores over a 24 month period. Yikes!

                   

                  Is there away to avoid this or at least, even create a work around?

                   

                  Thanks,
                  Don

                  • 6. Re: Data blending, missing months.
                    Jonathan Drummey

                    Hi Don,

                     

                    This is one of those cases where the desired views/analyses affect how we'd want to shape the data. You talked about wanting to pad the data, but I don't know enough about what you want to do to have a guess as to whether that's really necessary or not. For example, in the workbook I posted you can see spread out bars when using a continuous date field without requiring any padding whatsoever. In this case Tableau is effectively doing the padding itself, there are other cases where we can use Tableau's built-in padding to avoid having to do our own padding.

                     

                    It seems like you want every UPC for every Store for every month, my question is, what kinds of analyses do you want to do? Are there specific views or kinds of charts that you want to see?

                     

                    Jonathan

                    • 7. Re: Data blending, missing months.
                      Don Barnetson

                      Hi Jonathan,

                       

                      This all came about when we first started working with the BDI (Business Development Index) which you had helped me with in the past. (See original link below)

                       

                      http://community.tableau.com/thread/118747

                       

                      We realized that the numbers from the BDI calculations within Tableau did not match the numbers we had in Excel using the same data base.

                      We had also come to the conclusion that we needed to add zeros for the missing UPC's to pad the data so that our numbers would match.

                      Adding zeros to 1 month of data for all UPC's added 60+ million rows to a data base that started with only 17+ million rows does the job, but it's not very efficient. There are over 2000 UPS and 8000+ stores.

                       

                      I'm hoping that this will give you a bit more back ground information in to our situation. There are few few examples within the link that I had posted above.

                       

                      Thanks,

                      Don

                      • 8. Re: Data blending, missing months.
                        Jonathan Drummey

                        Hi Don,

                         

                        I'd seen that email on the other thread right before the conference when I was slammed with lots of other tasks, so I didn't get a chance to follow up and I'm sorry we didn't meet at TCC. Email me at jonathan (dot) drummey (at) gmail (dot) com and we can set up a time for a webex, I think that would be the fastest way to figure out how I can help.

                         

                        Jonathan