6 Replies Latest reply on Nov 9, 2016 3:24 PM by Sarah McGraw

    Completely flummoxed by something that should be very simple--addition!

    Sarah McGraw



      I am completely confused by something that should be extremely simple. Which is worrisome, because this project is going to get a lot less simple very soon.


      All I'm doing, at this early stage in my project, is trying to make sure some target data I'm working with rolls up properly. My source is set up very simply, as a flat file with three column only: location (12 locations), date (all dates are the last day of the quarter, with 12 quarters represented), and target (integer). In other words, each of 12 locations has one target for each of 12 dates.


      Right now, I can get all the numbers to show up properly in a table when I put YEAR and QUARTER on the columns shelf, location on the rows shelf, and drop MAX(Target) on text in the marks box. Or, I can get them to display properly in a column chart by putting area, YEAR, and QUARTER on the columns chart and MAX(Target) on the rows chart.


      Now, it makes sense to me that I should be able to pick any other aggregation for total since there's only one value for each area-date combination....and it's almost true. I can choose MAX, MIN, AVERAGE, or MEDIAN and get the correct (only) number, but as soon as I change it to SUM all **** breaks loose. The numbers blow up like crazy and i can't tell why. I can't even figure out how to reproduce the numbers--but there's clearly some multiplication involved, as most of the "SUM"s are far larger than the actual sum of all my data points.


      As an example, if I choose MAX(Target) for area 1, I get the CORRECT value of 8,071 for 2019 Q2. If I change to SUM(target) I get 60,101,276 (!!!). If I manually sum all the areas' 2019 Q2 targets, I get the correct total of 162,500. But If I switch to SUM(Target) and remove area, I get 2,530,334,993. What is happening here?


      This is a big problem because I need to be able to remove area from the view to show the sum total for each time frame. I will also need to bring in an additional "office" tier of aggregation. (When I do that I will remove the current target numbers and replace them with numbers that sum up to the area level).


      What am I doing wrong here? This seems like it should be one of the easiest things to do and i feel sure that I've done this is a previous project, in tutorials, etc. I can't tell what's different here.


      I have attached an excel spreadsheet with the source.


      Thank you!

        • 1. Re: Completely flummoxed by something that should be very simple--addition!
          Doug Carter

          Hi Sarah!


          Is it possible to attach a workbook showing this information? In my testing, when I just did a Year(date)Quarter(date) by a sum(Target), My Q2 shows 162500 which is what you say you want. It almost sounds like there is a table calculation going on here (like a sum of all data in the rows) but without seeing the workbook, I'm really not sure how that's coming into play.


          Edited to add: Also your J data is duplicated. Everything else has 1 entry per month, J has 2. this could also be part of the problem. Here's my workbook to show this (created in 10.1.1)



          • 2. Re: Completely flummoxed by something that should be very simple--addition!
            Sarah McGraw

            Hi Doug! Thanks for the quick reply. So, I just tried to reproduce the problem inside a new workbook to post here (there is some additional stuff in my project I can't share), and the problem went away. Something I didn't mention before is that I do have two other tables in my source. The first has office IDs, the second is a simple crosswalk between officeID and area, and the third is the file I uploaded earlier (btw, the "J" data isn't duplicated in my file; I must have made a mistake when overwriting the names so I could share it). I thought the joins wouldn't matter since the view in question only used fields from the third table, but now I think that there must have been duplication in my source based on the joins. I'm going to look at them again and see if that's the problem.


            I am seasoned with joins in SQL, but I think there must be something slightly different about how the way they're working here. This project is eventually going to end up with many tables in it, and now I'm wondering if I need to make fewer, larger custom sql queries  to reduce the chances of something like this happening in a place where I'm less likely to notice it.

            • 3. Re: Completely flummoxed by something that should be very simple--addition!
              John Croft

              I'm not grasping your problem from your post. Can you share your workbook. To me when I look at Sum(TARGET) with area in view, it looks like this. Which sounds like the numbers you like.



              But when I remove area from the viz, then the collapses the number into the quarterly sums like this. And this is the 'blow up' you reference above. If you need to keep a different detail of calculation, the you should explore level of detail calculations possibly.


              2 of 2 people found this helpful
              • 4. Re: Completely flummoxed by something that should be very simple--addition!
                Sarah McGraw

                I think--though it doesn't make sense to me based on my experience with joins in SQL--that the numbers were getting blown up because of some joins in my workbook. I didn't mention that part before because none of the fields from the other tables were getting used in this view. I tried reducing all the joins to inner (even though I will actually need a left join in my finished product) and the blow-up remained. However, when I create a new document using only the source file I shared, the problem goes away. Ultimately, I am going to need to combine a more complex version of the target file I shared with some actual performance data coming in from SQL.


                I clearly need to sit down and think about this aspect of the project. I can simplify somewhat by collapsing a few custom SQL queries into one big one, but I'm still going to have to join to several excel tables as well and I can't have the numbers going crazy on me like this. Any advice about joins?

                • 5. Re: Completely flummoxed by something that should be very simple--addition!
                  John Croft

                  Yes the one to many joins will duplicate the date from the base table. So if you have an customer ID that joins another table with 12 line (one per monthly sales), you are essentially have 12 line with the same age. So if you sum that age appears to be 12 times what it should be.

                  1 of 1 people found this helpful
                  • 6. Re: Completely flummoxed by something that should be very simple--addition!
                    Sarah McGraw

                    Yes, I didn't really realize that the data was coming in this way. I was thinking of the joins the way I do in SQL, where only the table in use is reflected in the results. This goes very much against my instincts and usual habits. In fact, if everything is always getting duplicated out from the base table, it's hard to see how joins will be very useful to me in Tableau at all, since my source data has LOTS of crosswalks and lookup tables that I will have to rely on to link my targets (excel) to the data as it comes in. No matter which data source I use as the base table, I think this is going to be a challenge. I can simplify somewhat by making my custom SQL a little more complicated, but I think there's no getting around the whole problem.


                    However, I got this workbook to display correctly using a data blend. I can see that I need to spend more time learning about how to make the choice between these two options.