0 Replies Latest reply on Nov 2, 2012 9:04 AM by Simon Trigona

    Datawarehouse Data Quality Report

    Simon Trigona

      I'm trying to prototype a dashboard/scorecard for the quality of the data in my data warehouse. I've got something going in Excel and it works, but it has its limitations, so I thought I'd try out Tableau and see if I can come up with a better solution.


      "Bad" data or data with issues is when a fact has one or more "unknown" dimension attached to it, this can happen for several reasons and we're working on fixing the ETL process to better match things up. This report will help us know how we and the data are doing.


      Now I'm not sure how to fit this into Tableau as I essentially want to do the following for various combinations of facts and their associated dimensions: calculate the % of Total of the cost of goods for the dimensions with a value of "unknown" for the previous year (I'm fine with this being a quick filter). Sounds easy right? I'm using OLAP/Cube as a data source. This rules out quite a few techniques and makes several others much more difficult.


      Ideally I want to have a report/dashboard that says:

      Data% of Total
      Sales - Location1.7%
      Sales - Product0.1%
      Sales - Customer10.9%

      Or something along those lines.


      Currently the best I've come up with is putting the dimension in question on the Rows shelf and the % of Total for COGs as Text (Marks card) and hiding everything except the dimension "unknown". The problems with this are that 1) if there's a new dimension value then it won't be hidden 2) every row has be pulled into Tableau to make this work. Point 2 is an issue for dimensions with a lot of rows (invoice has 400,000+ rows for 1 year), it pulls down a lot of data that isn't necessary.


      Any thoughts on if and how this can be done?



      Bonus question: Some dimensions have more than 1 "bad" value. For example "unknown" and "" (empty string). Ideally I'd want the sum of the two.