5 Replies Latest reply on Dec 8, 2016 8:53 AM by Jian Wang

    DWH Regression Testing

    Vladimir Martynenko



      We have a data warehouse (Oracle Db) with some data marts, which we use as extracts in Tableau. I have an idea to automatize regression testing in Tableau when implementing new capabilities in DWH. We need to compare the same data marts (extracts) from PROD and PREPROD servers. E.g. we could compare sum of income and number of records by date. I have made 2 extracts, blended by date field, in the view I show the percentage difference between PROD and PREPROD by date:



      So, if we have a big difference, then we should check the reason.


      It works good, because tableau joins extracts by 1 field quick. But each extract contains a lot of attributes ("Sale city", "Sale channel", ...) and now we have approximately 30 extracts. We should compare each attribute for each extract. In one view I can make only 1 blending by 1 attribute. I do not want to make hundreds of views (1 view for each attribute and extract). Ideally I want to have 1 dashboard in which I can see the result of regression testing for all attributes and all extracts. Does anyone have regression testing in Tableau? Do you have any idea how to visualize DWH regression testing?

        • 1. Re: DWH Regression Testing
          Jian Wang

          I believe you need to change the way you manage the extracts. Use Union to merge data from different server instead of blending. When preparing data source, add additional column for SERVER _NAME and maybe TABLE _NAME(if it is needed).

          • 2. Re: DWH Regression Testing
            Vladimir Martynenko

            Hello Jian, thank you for replying.

            It is not possible to do this with all extracts, because some of them are large and it takes 2 hours to refresh only 1 large extract.

            But if we make this for small extracts, how to visualize regression testing for all attributes in 1 dashboard?

            • 3. Re: DWH Regression Testing
              Jian Wang

              I hope I understand your question right... with what you'd like to do, you still have to merge the data like what I described. Then define a dimension ATTRIBUTES only used to store attribute names, a dimension SERVER NAME only used to store server names, ...etc. then for measure, ONLY ONE MEASURE, I usually call it VALUE. This way you will gain lots of flexibilities when designing your worksheet. If you came from BI cube world, it's the matter of defining many individual measures vs. one measure + a Measure Category dimension.

              • 4. Re: DWH Regression Testing
                Vladimir Martynenko

                If I understood you right, your idea is to have a table with 3 columns: Server name, Attributes and Value. It is a flexible structure. But it is not possible to store big amounts of data.

                • 5. Re: DWH Regression Testing
                  Jian Wang

                  Can't you use some ETL tools to make data smaller? Without preprocessing data, I don't see how you will work it out unless you are really good at working with SDK to achieve what you want to do. But I believe dealing with data is much easier... my 2 cents.