1 2 Previous Next 17 Replies Latest reply on Sep 26, 2018 12:00 AM by krishna sarath

    Handling Missing Values in Line Graphs

    Igor Kaludjer

      Let's say I want to show a line graph of Sales by Date and Product (Coffee and Tea). There were no sales for Tea in February so there's no data in the database. Tableau creates a line connecting January and March Sales (see attached workbook) thereby implying there were some sales in February.

       

      Even if I filter Sales to Non-null value the graph stays the same, but if I keep only the product with missing data (Tea) then Tableau displays January and March only. This seems inconsistent because the display of one product shouldn't be affected by whether other products are included in the view (except for scale).

       

      P.S. Sorry if this issue was already discussed.

        • 1. Re: Handling Missing Values in Line Graphs
          James Baker

          There's no data for "February, Tea" here, which is different than having a row saying "February, Tea, 0 Sales" (in which case the Tea line would dip to zero in February).

           

          Tableau is thinking, "Perhaps Tea sales are reported on a bi-monthly basis".  In other words, the thinking is that when you're looking at a line graph you want to see the angle of the line (the amount/rate of change), even if your line segments are not all the same size.

           

          Regarding the axis contents: should we show all 12 months even though there are data in just 3?  Or are you saying that because dates are a special case that we have more knowledge about, that we shouldn't skip in-between/intermediate months even if there is no data?

          • 2. Re: Handling Missing Values in Line Graphs
            Igor Kaludjer

            I wasn't saying that Tableau should display zeros for missing data. That would be wrong, I know.

             

            What I am saying is that Tableau should not make assumptions such as: "Perhaps Tea sales are reported on a bi-monthly basis". Even more because it's obvious that's not the case in this dataset. The argument here is the same as your argument that one should be able to differentiate between missing data and 0 value: one should also be able to differentiate between missing data and any valid data when using line graphs. If you take two datasets: one where Tea Sales in Feb were 450 and one where there were no Tea Sales in Feb you get the same line graph.

             

            I don't know how this should be implemented. Maybe Tableau should display a dashed or dotted line for missing data or simply break the line and not display anything.

             

            Regarding axis contents: I'm saying that the software should behave consistently i.e. if you show February Sales for Tea when Coffee is in the view, you should show February Sales when Coffee is filtered out. This way I filter out Coffee and my February sales for Tea suddenly disappears. Another inconsistency is that if I use a bar chart I see no Tea sales in Feb but with a line graph I get a line.

            • 3. Re: Handling Missing Values in Line Graphs
              James Baker

              I confess that I went too far in my anthropomorphizing there - it's not that Tableau is "thinking" about dates, it's that it's *not* thinking about dates any more than any other generic categories, and thus allows for arbitrary connections (line relations?) between them.

               

              Categories on an axis are data values of that dimension.  I think this is how it works: We don't want to read the entire table (which could be very slow) to know the entire domain of those data values - so Tableau doesn't know, when a query returns data for 2 categories, whether there are 0 categories or 1,000 categories "in between" those values.  We don't know whether there's "missing data" because all we know is that this query didn't return any values in between.  So I think your dotted-line suggestion would be great for dates, because dates are special and we DO know how many months, etc there are.  Also, in this case (even if "February" were "foobaz") we know about "February" for Tea because of our Coffee data, and could thus plot the Tea line differently.

               

              Special side note: we're currently playing with some changes to the interactivity of lines, which I like - though those changes only come into play once your mouse is over the line.

               

              Regarding axis contents: When using the Page shelf, the axes are held constant, because you want to remain absolutely stable as the slice of data changes.  When using filters, we don't show columns for data that doesn't exist because A) it doesn't exist and B) you might be filtering down from 5,000 values to just a few, so showing everything would be harmful.

              • 4. Re: Handling Missing Values in Line Graphs
                Igor Kaludjer

                Let me give you a real word example (based on a true story). An analyst developed a report which is updated every month and sent to managers. Everything worked well until one month there were missing data (missing not because of an error, but because there wasn't anything to report). The report was mainly based on line graphs. Hence managers thought that there was some data this month when there actually wasn't which led to many problems.

                 

                Needless to say the analyst got the blame for this. Now the analyst is traumatized: every time he sees a line graph he creates a data table and checks for missing data.

                 

                I understand that you don't want to scan the whole table to display all the possible values of a dimension, but if you do display a certain value (such as March in our example) couldn't you indicate somehow that there are no sales for tea in Feb (e.g. breaking the line, dotted line)? Having an option to turn this on and off would be perfect.

                • 5. Re: Handling Missing Values in Line Graphs
                  Alex Blakemore

                  Three brief points:

                   

                  1. One obvious idea, use bar charts instead of line graphs to make missing data stand out

                   

                  2. Ask Tableau to provide an option for line charts to change value as step functions, rather than to interpolate between data points. This would be useful for other data, say showing inventory change in bulk the day of a delivery, not gradually ramping up over time. I've asked for that feature for a long time, but they keep adding other (really good) features instead.

                   

                  3. Sometimes it makes sense to use custom sql and a SQL UNION clause to combine your data with a table of nulls or zeros for each date. This at least gets a row in your table for each day or month. Use with care. The exact SQL depends on what you're trying to accomplish.

                  • 6. Re: Handling Missing Values in Line Graphs
                    Vincent Romeo

                    Excel for ages has differentiated between date axis and regular axis and allowed one to optionally treat Null as zero.  I think this feature is long overdue in Tableau.  Depending on the circumstances, it can be VERY costly in terms of database processing, bandwidth (and storage, for packaged workbooks) to UNION in order to add zeros to data.... For a real-world example, consider trouble-ticket information for a large organization with thousands of servers running hundreds of applications.  Most of those servers/apps have no problems on any given day/week/month so this data is VERY sparse.  UNIONing to fully populate the data is totally impractical.  And if one adds additional dimensions (i.e. ranking of the Severity of an event) it gets even worse....

                    And, for trending purposes, if the user turns on "Nulls as Zero", these 'virtual' zeros need to be included in the fitting of a trend line/curve.

                    • 7. Re: Handling Missing Values in Line Graphs
                      guest contributor

                      Is there a solution for this?  I have a dataset that contains outage information - if a particular system isn't out on a given day I don't have any entry on those days.  When I aggregate over days/weeks/whatever, on time periods where there was at least one outage I have a value, on other dates I have nothing but would like zero displayed.

                      • 8. Re: Handling Missing Values in Line Graphs
                        guest contributor

                        I want to use a line graph but have run into this problem as well.  In my case, I absolutely cannot show a null value as zero. My graph looks at productivity and the dimension is the average follow up time for a case. Shorter follow up times means better performance. Therefore a zero time in the place of null would be wrong.

                         

                        An option to display null as zero or nothing would be helpful.

                        • 9. Re: Handling Missing Values in Line Graphs
                          Richard Leeke

                          Take a look at the options Joe and I posted on this thread (if you haven't already).

                           

                          http://www.tableausoftware.com/forum/handling-null-values-/-missing-values#comment-16042

                          • 10. Re: Handling Missing Values in Line Graphs
                            Karl Young

                            I know that I'm years after the original post here, but I thought I'd share a way to do this that I've figured out.

                             

                            First, go to the line graph, and change your columns to discrete.

                            Second, go to the formatting menu for your metric, and under Special Values, set Marks to "Show at default value".

                             

                            The only downside of this is that now you don't get the nice continuous axis on the bottom, but rather a list of every day in the selected period.

                            1 of 1 people found this helpful
                            • 11. Re: Handling Missing Values in Line Graphs
                              Johan de Beurs

                              Another alternative for this, depending on your data and needs:

                              - Right click on the date pill and 'show missing values'

                              - Do a quick table calculation of moving average, addressing 0 periods before and after

                              2 of 2 people found this helpful
                              • 12. Re: Handling Missing Values in Line Graphs
                                Slava Sapojnikov
                                1 of 1 people found this helpful
                                • 13. Re: Handling Missing Values in Line Graphs
                                  Olivier Delalleau

                                  I agree this is very annoying and was unable to find a proper solution. The closest one is the conversion of the date field to "discrete" and ensuring "Show missing values" is selected, but (1) the date formatting for discrete dates is terrible when you have a long period of time, since it insists on printing each date, and (2) it breaks the line, while I would want it to go to zero.

                                   

                                  I realize there's probably no universal default behavior that would work for every use case, but it'd be great to have a way to address the most common ones, like this one (for instance being able to force a default value on a date with no data, possibly with a special line formatting like a dotted line). The current default behavior is really dangerous and there is no satisfactory way out of it.

                                  • 14. Re: Handling Missing Values in Line Graphs
                                    Brian Pringle

                                    Here is how I worked around this...

                                     

                                    1. Create a table or spreadsheet with the dates / months / years for which you want the time periods.  I have a SQL table with a list of business dates and used that as my left-most table in the Data Source list.

                                    2. Tie the other tables to that table, making sure that they are all left-joined to it so that there are nulls for the item that you are trying to count for those particular date ranges

                                    3. Use a parameter to set the value that you are checking instead of a filter so that all dates for all records show in your data set.

                                    4. Use the following formula to check if there are null values for those date ranges.  If there are, treat them as "0".  Otherwise, treat them as "1".  Use the ZN function to catch any null values.

                                     

                                    ZN(

                                    IF ISNULL([Field to Count]) THEN

                                        0

                                    ELSEIF INT([Parameter for Item]) = [Field to Count] THEN

                                        1

                                    END

                                    )

                                     

                                    4. Put the date field on the Columns shelf as "Discrete" and the calculated field on the Rows shelf as "Continuous".  Measure the calculated field as "SUM".

                                    1 2 Previous Next