5 Replies Latest reply on May 20, 2020 6:11 PM by Amanda Boyle

    Label by percent difference of running Average

    Luis Garcia

      Hi All,

       

      I am trying to calculate the percent difference between COVID-19 Cases 14 days ago vs the current 14-day running average of new cases to label the US states on a map. So April 16 (Latest data is April 29) new cases vs April 29th's 14-day running average of new cases.

       

      I made an LOD of  [Date] = {FIXED : MAX([Date])} to get the new cases for the latest date and then another LOD of [Date] = ({FIXED : MAX([Date])} - 13) to get the new cases for 14 days ago. (Is there a better way to do this?)

       

      Then I plotted on a table New cases, new cases 14 days ago, running avg of new cases, and then I made a calc of  (Running AVG - new cases 14 days ago) / (New cases 14 days ago) to get the percent difference which gives me the correct numbers, but when I try to use that calculation to label the states it gives me blank result.

       

      I'm thinking it's because I am using a window avg to calculate the running average and that info is not available when I use the map, but I'm not sure how else I could accomplish this.

       

      Thank you in advance. TWBX is attached.

        • 1. Re: Label by percent difference of running Average
          Joe Oppelt

          To save me time of reverse-engineering what you did, are the numbers in the Numbers sheet correct?  Table calcs are the way you want to do this, and I see that you've done that.  If it's a matter of getting those numbers on the map, (and we would have to add county to the Numbers sheet to verify correct values), then it would just be a matter of proper table calc settings on the map.

           

          (PS:  I assume you would just want the latest data shown on the map.  Or even let the user pick a date and then show the data for that date.  We can do that.)

          • 2. Re: Label by percent difference of running Average
            Luis Garcia

            Hi, yes the numbers in the numbers tab are correct. I would want the map to show the percent difference (last column) for each state

            • 3. Re: Label by percent difference of running Average
              Jonathan Drummey

              Hi,

               

              First of all, the New Cases (Difference) number used in the view is not the number of confirmed cases, it's the number of confirmed cases + deaths. Here's a view of a single day's worth of data for New York where I unhid the Case Type, Combined Key, and FIPS columns. We can see that there are two rows for each Combined Key (county in the US):

               

              Screen Shot 2020-04-30 at 10.53.13 AM.png

               

              Therefore if you want to analyze confirmed cases alone then the Case Type field either needs to be filtered for, used as a dimension in the view, or used to create new comfirmed cases & new deaths measures similar to the (hidden) Confirmed Cases & Deaths measures. In the following views I've added Case Type to Filters.

               

              I've got two messages here...the first one is to make sure you understand the data that you are working with...Tableau makes it incredibly easy to point & click and drag & drop and interact with the data and see numbers, so it's up to us to "slow down" a bit to validate that what we are working with is what we think we're working with. The second one is that we have to make sure that we're responsibly using the data, here are a couple of links from Coronavirus (COVID-19) Data Hub | Case Tracker, Starter Dashboard, Visualizations | Tableau the Tableau data hub: data viz best practices related to COVID-19  and 10 considerations before you create another chart about COVID-19.

               

              Now on to your question. Table calculation functions like the LOOKUP() and WINDOW_AVG() calculations operate on the "table" of query results that is in the view (based on current filters and the dimensions in the view, i.e. on Rows, Columns, Pages, and the Marks Card). The original Numbers text table sheet has DAY(Date) as a dimension and the calcs are computing on the Date, while the original Map view has State as a dimension, so these are not the same "table" of data and therefore the table calculations won't return the same results.

               

              LOD expressions won't work in this case because LOD expressions can't do lookups or moving averages except under very, very limited circumstances so we need table calculations, and in order to get for the table calculations to work for this use case the map has to include the Date in the level of detail in the view. This returns a mark for every date/state combination so we then need to filter for a particular date, I added a table calculation filter on date (so the table calculation filter is applied *after* the other table calculations are computed) and just selected a single date.

               

               

              Here's the updated view:

               

              Screen Shot 2020-04-30 at 10.59.25 AM.png

               

              I don't think this is a good analytical view and I'd strongly advise against using it because the daily reporting in this data set is really noisy data (due to delays in reporting & corrections). Here are three examples from this particular data set that I picked up while building these views.

               

              * Nevada reported 9 cases on April 16 and that made the % diff for April 29 monstrously large.

              * New Mexico reported 0 cases on April 16, that causes the % diff calc to return Null for April 29. Ordinarily a % diff calc would use an IF statement to return something like 100% but that's not really a number given the Nevada April 16 issue.

              * Nevada also reported -34 cases on April 27, this gets rolled into the moving average but will blow up the % diff calc on May 12.

               

              Here's a line chart for Nevada & New Mexico showing just how noisy this daily data is:

               

              Screen Shot 2020-04-30 at 11.00.16 AM.png

               

              Packaged workbook is attached, let me know if you have any questions.

               

              Jonathan

              3 of 3 people found this helpful
              • 4. Re: Label by percent difference of running Average
                Luis Garcia

                Thanks Jonathan! Not only for your explanation of the problem but all the extra context.

                • 5. Re: Label by percent difference of running Average
                  Amanda Boyle

                  Update: Introducing the COVID-19 Data Platform

                   

                   

                  Working hand in hand with our friends at MuleSoft and Salesforce, we built an open and resilient flow of COVID-19 data, available everywhere you need it to prepare your organization for the next phase of business. This new COVID-19 Data Platform ingests and aggregates data across public sources, including The New York Times, European Centre for Disease Prevention and Control, and the COVID Tracking Project. The platform then curates them into standardized data models that can be reliably used to make data-driven decisions, whether through visualizations or automated processes.

                   

                  On Friday, May 29 we will be deprecating support for our current COVID-19 activity dataset (JHU) in favor of the sources listed above. We are no longer using Johns Hopkins University data because we have found new, improved ways of automating the collection of COVID-19 data from different sources through this new COVID-19 Data Platform. You can access this new data through Mulesoft’s Anypoint Exchange, through the AWS Data Exchange, and through data.world with both Hyper and CSV file downloads.

                  If you would like to learn more please visit the “The next evolution of COVID-19 Data” blog post and the COVID-19 Data Resource Forum.