1 2 Previous Next 15 Replies Latest reply on Nov 12, 2015 1:52 AM by Alexander Mou

    Table calculations across multiple data sources

    David Andrade

      Looking for some Jedi help on summing a field across multiple data sources. Created a sample workbook to better show what I'm talking about. I have 3 different data sources to represent 3 different locations (US, Mexico, Canada). I have a Company (A, B) which can link the data sources. I want to learn how to sum revenue, cost, and profit across all locations and companies.

        • 1. Re: Table calculations across multiple data sources
          Catherine Rivier

          Hi,

          So I came up with one possible solution to your issue.  Others might have different ways to tackle, but here's a start:

           

          The big issue, as I'm sure you found in your attempted [North American Revenue] field, is that that Company/Location must have data when summing them in one field, or they will not show up in a view.

           

          Firstly, one way to solve this would be to set up all of your data sources so they all have every Company/Location combination, with $0 for each Sales/Profit/Revenue field.  Then all joins would work, though of course a little more work upfront that may or may not be possible.

           

          The other way I took was to create a series of IF... THEN statements that would go through your possible options.  These are the 3 "Total__Final" fields in the new "NEW PRIMARY" data source.

           

          I ran the IF.... THEN through these possibilities:

          1. All 3 data sources have data for that Company/Location
          2. 2 of 3 have data (3 possibilities)
          3. Only 1 has data (3 possibilities)

           

          Tableau will go through the IF... THEN in order, so it will check the first, then the second, then the third.  For example, Total Cost:

           

          IF (SUM([Mexico (Profitability.xlsx)].[Cost])>0 AND SUM([Canada (Profitability.xlsx)].[Cost])>0 AND SUM([US (Profitability.xlsx)].[Cost])>0)

          THEN SUM([US (Profitability.xlsx)].[Cost])+SUM([Mexico (Profitability.xlsx)].[Cost])+SUM([Canada (Profitability.xlsx)].[Cost])

           

          ELSEIF (SUM([Mexico (Profitability.xlsx)].[Cost])>0 AND SUM([Canada (Profitability.xlsx)].[Cost])>0)

          THEN SUM([Mexico (Profitability.xlsx)].[Cost])+SUM([Canada (Profitability.xlsx)].[Cost])

          ELSEIF (SUM([Mexico (Profitability.xlsx)].[Cost])>0 AND SUM([US (Profitability.xlsx)].[Cost])>0)

          THEN SUM([US (Profitability.xlsx)].[Cost])+SUM([Mexico (Profitability.xlsx)].[Cost])

          ELSEIF (SUM([Canada (Profitability.xlsx)].[Cost])>0 AND SUM([US (Profitability.xlsx)].[Cost])>0)

          THEN SUM([US (Profitability.xlsx)].[Cost])+SUM([Canada (Profitability.xlsx)].[Cost])

           

          ELSEIF (SUM([Mexico (Profitability.xlsx)].[Cost])>0)

          THEN SUM([Mexico (Profitability.xlsx)].[Cost])

          ELSEIF (SUM([Canada (Profitability.xlsx)].[Cost])>0)

          THEN SUM([Canada (Profitability.xlsx)].[Cost])

           

          ELSE SUM([US (Profitability.xlsx)].[Cost])

          END

           

          This clearly could get out of control if you're trying to join 5, 6, 7 data sources, though it would still work under the same principles.

           

          The other piece, by the way, is the new Primary data source.  If you don't have a single data source that has all possible Company/Location combinations (here US comes close but is missing Company B Location 4), create one that simply has a list of all possibilities, as I did with the "NEW PRIMARY" data source.

           

          Hope this helps - and I look forward to seeing if others have different solutions for this!

          Catherine

          • 2. Re: Table calculations across multiple data sources
            David Andrade

            Catherine, thanks for taking time to work out this problem and provide your response! How did you go about creating that new data source "NEW PRIMARY"? Is it possible to create a new data source from the 3 existing data sources that blends all of them together? I do have the "Company" field which could join the three tables.

             

            Is there a way to show on a new table the totals for each company? So that we have the total cost, revenue, profit across all three countries for each company?

             

            Thanks!

            • 3. Re: Table calculations across multiple data sources
              Robin Kennedy

              Catherine describes and has built a 'scaffold' data source whereby all unique combinations over all of the different data sets have been identified and then used as the primary source to blend with the others. I'm not sure why such a lengthy formula was used... I think use of the ZN function can probably get around this.

               

              Another option would be to combine all your data using UNIONs, rather than blends, if they live in the same data source. Check out this article for more information on unions...

               

              http://kb.tableausoftware.com/articles/knowledgebase/union-related-data-across-multiple-excel-tabs

              • 4. Re: Table calculations across multiple data sources
                Catherine Rivier

                David, for NEW PRIMARY all I did was take the possible combinations of Company/Location.  Since the list was small, I simply wrote it out in notepad and copy/pasted it as a new data source.  I'm guessing you'd want to be a lot more official in your data source.  Hopefully it's something you can put together though, since for a solution to work, I'm afraid you need a primary data source that has all possible values.  (As you saw in the original, Secondary sources will only show up if data exists in the Primary.)

                 

                I don't know your data environment and how many options your original has, so hopefully this isn't beyond scope.  I don't believe there's any way to form this from data as is, except with some copy/pasting from multiple sheets, which of course isn't ideal.  Usually I build a table or list with all possible values and use that as my primary.  Though you can also get around all of this if one of your existing data sources already has all of the required data combinations...

                 

                Also:

                Is there a way to show on a new table the totals for each company? So that we have the total cost, revenue, profit across all three countries for each company?

                Absolutely - if you just take out Location from the view, this will work.  You can turn on subtotals and totals too.  Now if you are asking to total across rows, well that's a different story and will take the type of new calculated fields like above - so would be a pain

                 

                 

                ***************************************************************************************************************

                Robin, The ZN function will not work because of the data blending.  You can try it on the sample workbook yourself.  If you can solve this with a shorter query, you should post it!! I look forward to seeing your solution.

                 

                It's my understanding from David's original question that the data cannot be blended within the data source, which is often the case.  Of course if it could be, that would solve everything and make the entire question that was asked moot....

                • 5. Re: Table calculations across multiple data sources
                  Robin Kennedy

                  ZN works ok for me using, for example,

                   

                  zn([Canada cost])+zn([Mexico cost])+zn([US cost])

                   

                  See attached.

                  1 of 1 people found this helpful
                  • 6. Re: Table calculations across multiple data sources
                    Catherine Rivier

                    Well never mind then, great!  It hadn't worked without the NEW PRIMARY source so I didn't post it.

                    • 7. Re: Table calculations across multiple data sources
                      David Andrade

                      Robin - The ZN function works on one data source, but will it work if I had to use it across the Canada, Mexico, and US data sources instead of on the one NEW PRIMARY data source?

                       

                      Is there an easy way, say if Canada was SQL Server, Mexico was Excel, and US was Access, to accomplish the same thing? If instead of just 2 companies we had 50 and instead of just 10 or so locations, we had hundreds, even thousands?

                       

                      I don't mean to over-complicate the problem, but I think it's important to think about the same solution through different scenarios and use this simple .twbx file as a model that could apply to much larger-scaled data.

                       

                      Is it best to take all that data from multiple data sources and make the NEW PRIMARY data source or can this easily be accomplished without the need to create the NEW PRIMARY data source? I don't think I'd really want to create a new data source.

                       

                      Thanks for making this a great discussion!

                      • 8. Re: Table calculations across multiple data sources
                        Catherine Rivier
                        Thanks for making this a great discussion!

                        I know, right?  That's why I love the forums

                         

                        The ZN fix is separate from the Primary/Secondary issues, I'm afraid (though of course correct me if I'm wrong here).  The way I like to think of it is a left join from the Primary, though it's not technically that.  (Jonathan Drummey explains it perfectly in detail here http://community.tableau.com/thread/121954 if interested:)

                        Tableau's data blending is not a true left join, though life would be easier if it was. What Tableau does is issue two queries, one to the primary data source and one to the secondary. The query on the secondary is an aggregate query at the level of detail defined by the linking fields in the view that will only return one record per combination of the values of the linking fields, whereas a normal left join would return as many rows as were in the secondary that met the joining criteria. Once the data is returned inside Tableau, the secondary is "left joined" back to the primary.

                         

                        So in other words, if a value doesn't exist in the Primary, it can't exist in a joined Primary/Secondary view.  You're asking Tableau to look at everything in the Primary, and only those records in the Secondary that match.

                         

                        But, rather than creating a new data source (the "Scaffold" data source of all values), you could combine this sort of view in one of your data sources.  50 Companies with 1000+ Locations would still be 50,000 rows, which is pretty manageable.  (You could also set up in your database as one list of Companies, one list of Locations, and do a cross-join, which will give you the same result.)  Then you can either Left Join or Union to create a Primary with every value in one of your 3 original sources.

                         

                        Though the manageability comes down more to how often the Companies and/or Locations could change.  If they won't change very often, I would actually not be too apprehensive of this work.  It might be a lot to set up at the beginning, but once done it will be easy to maintain...

                         

                        I'll use your example that Canada is SQL Server, say in table "Canada".  I would build the table (or cross-joined result) of all Companies/Locations, and call it "CompanyLocation".  Then for a Left Join, do a Left Join from CompanyLocation to Canada, linking fields Company and Location.  For a Union, union all of those records in CompanyLocation to the bottom of table Canada, with null values for any other fields.

                         

                        In this situation, Canada would be your primary data source and things would work like you want.  You can do this in SQL Server, Access, or even Excel with some fancier lookup formulas.

                         

                         

                        Or you can convince the data admins to put all 3 lists in the same data source.  (Then, as Robin pointed out, all 3 can just be Union queried together.)  If the Access database is yours, did you know you can link to an outside data table like one in a SQL Server database with a simple ODBC connection?  You can also link to an Excel file and make that a table - though I've never used that before....

                        1 of 1 people found this helpful
                        • 9. Re: Table calculations across multiple data sources
                          Jonathan Drummey

                          I've been lurking on this thread...to get really particular about details, the problem with the ZN() function isn't due to the blend so much as because the data is sparse. There's a question of order of operations here that makes an interesting point and explains why Robin's approach works:

                           

                          PrimaryCalculation Located InCalculationsResult for No Data for a Cell in CanadaReason
                          ScaffoldCanadaSUM([Cost])Nullthe SUM([Sales]) is issued in the query on the Canada data source, and returns nothing because there's no row
                          ScaffoldCanadaZN(SUM([Cost]))Nullthe ZN(SUM([Sales])) is issued in the query on the Canada data source, and returns nothing because there's no row
                          ScaffoldScaffoldSUM([Canada].[Cost])Nullquery is run on Canada  and returns nothing, the value is put into a temp table that Tableau uses
                          ScaffoldScaffoldZN(SUM([Canada].[Cost]))Nullsame as above
                          ScaffoldScaffoldSUM([Canada].[Cost]) as "Canada Cost"Nullsame as above
                          ScaffoldScaffoldZN([Canada Cost])0The value of Canada Cost is Null in the temp table, so when Tableau evaluates ZN(Null) it comes up with a 0

                           

                          An alternative approach that does much the same thing as Robin's is:

                           

                          ZN(LOOKUP(SUM([Cost]),0))

                          + ZN(LOOKUP(SUM([Canada (Profitability.xlsx)].[Cost]),0))

                          + ZN(LOOKUP(SUM([Mexico (Profitability.xlsx)].[Cost]),0))

                           

                          The LOOKUP() function is performed in Tableau after the blend, and either returns a value or Null, then we can wrap that in the ZN() function to get 0's. However, since LOOKUP() is a table calc that means that any further calculations must be table calculations, we lose the ability to quick-sort it in a view, etc. So I'd stick with the ZN([calculated field from secondary])

                           

                          And Catherine makes an excellent point in her last paragraph. With ODBC, you can add far-away tables, queries, and views as data sources into one database, and then do things like UNION them together. I've got a quarterly import that I do that uses an Excel worksheet that gets emailed to me, I just rename the file and Access treats that file as a datasource for data that goes into a Tableau extract.

                           

                          Jonathan

                          1 of 1 people found this helpful
                          • 10. Re: Table calculations across multiple data sources
                            David Andrade

                            Learning so much today! So Jonathan, would you do anything different to the original workbook than what Catherine and Robin have done? It's always helpful to see a working example.

                            • 11. Re: Table calculations across multiple data sources
                              Jonathan Drummey

                              Hi David,

                               

                              I can't add anything else to what Robin and Catherine have said, and I don't know enough details about your environment to have any particular recommendation as to different options they laid out. My personal preference is to automate as much as possible of the data preparation piece, so I'd lean towards a solution that UNION'ed the data sets together. That way you just have one Cost Measure (vs. one for each country, then the overall Cost), and when a new location/country comes online all I have to do is copy a chunk of the existing query and paste it in to point to the new data source and everything else flows (so just one modification), whereas to do a blend means multiple modifications (add the new spreadsheet to the view, modify every measure calc). That does require being comfortable with creating ODBC datasources and manipulating SQL, and if those resources aren't available then another option would be better.

                               

                              Jonathan

                              1 of 1 people found this helpful
                              • 12. Re: Table calculations across multiple data sources
                                Catherine Rivier

                                Jonathan, my mind is blown over this.  Wow.  Seriously, thanks, and thanks for the detailed explanation.  This is a thing I've never realized, or never realized the implications of how it's done.

                                In fact, Robin first mentioned ZN being a solution and I thought, "well I tried that first, and I couldn't get it to work!"  And when it did, I couldn't figure out what I had done wrong...

                                 

                                It's amazing to me that in one calculation it DOESN'T work:

                                     ZN(SUM([Canada (Profitability.xlsx)].[Cost]))

                                Two calculations it DOES work:

                                     [Canada cost] = SUM([Canada (Profitability.xlsx)].[Cost])

                                     ZN([Canada cost])

                                 

                                So you did a great job of explaining as usual, though I'm not sure if I have it completely right, but here's a hack restatement of what I think you're saying:  This is because the single calculation, and the first of the set of 2, are run solely in the Secondary, where it creates the Secondary "temp table".  Only after all Secondary stuff is done does it join this temp table to the Primary.  But, the entire second calculation of the 2-calculation case is run in the Primary.

                                In other words, any calculation that involves, in any way, a direct link to a Secondary, will run this as part of the Secondary "temp table" creation.

                                Is this about right, or am I way off?

                                • 13. Re: Table calculations across multiple data sources
                                  Jonathan Drummey

                                  You're welcome! It's one of those things where Tableau hides the details (of the order of operations) and 99% of the time it's magical, all we do is drag+drop pills and things work, and then 1% of the time knowing about it is really useful.

                                   

                                  I'm pretty sure from through reading through the logs that the "temp table" lives in Tableau, not in the secondary data source(s).

                                   

                                  Here's an underlying principle that I should have mentioned, I apologize: Tableau tries to do as much of the computation as possible in the underlying datasource, the idea being that the data sources are optimized for those computations. So, here's the order of operations as I currently understand it for blending in 7.0. I haven't done enough with 8.0 yet to have a sense of what's changed:

                                   

                                  - load up each query with everything Tableau thinks can be done and run that in each data source (some calculations can be completely computed here)

                                  - return that data to Tableau

                                  - do densification

                                  - do the blend

                                  - compute blended aggregates, i.e. calculations in one data source that use fields from another data source are computed

                                  - apply filters that depend on the blended aggregates

                                   

                                  Jonathan

                                  • 14. Re: Table calculations across multiple data sources
                                    Chris Stone

                                    This is a fantastic post. Thank you to all who contributed. I learned a great deal from this. Tableau really does need to do a better job about disclosing their processing procedures and their order of operations. SAS may be behind Tableau in terms of data visualization, but at least their training takes you inside the inner workings of their data vector space. Makes you wonder why they rely on a community to reverse engineer what a specifications document could lay out in 20 minutes. Especially since they haven't made big business of their training yet! Who knows? 

                                    1 2 Previous Next