3 Replies Latest reply on Jul 26, 2018 8:40 PM by Ankit Bansal

    How to accomplish star schema with multiple facts (without records falling off).

    Cory Weiner

      I have a fairly simple data model which consists of a star schema of 2 Fact tables and 2 dimension tables:

      • Fact 1 - Revenue
      • Fact 2 - Purchases
      • Dimension 1 - Time
      • Dimension 2 - Product

       

      These tables are at different levels of granularity -  meaning a given date could have many rows across many products. A specific date and product may have revenue, but no purchases. Likewise it may have purchases but no revenue.

       

      Each fact joins the both dimensions, which contain additional detail such as the product name, product category, etc.

                                    

                                       

      Fact 1 -   Revenue
      DateProduct IDRevenue
      6/1/2018A100
      6/2/2018B500
      6/5/2018C200
      6/5/2018F300

       

       

      Fact 2 - Purchases
      DateProduct IDPurchases
      6/1/2018A100
      6/3/2018D500
      6/9/2018F600

       

       

      What I would like to do is combine these two facts such that I can report revenue and purchases together (example, by date, by product, or by date and product combined):

       

       

      Report
      DateProductRevenuePurchases
      6/1/2018A100100
      6/2/2018B500
      6/3/2018D500
      6/5/2018C200
      6/5/2018F300
      6/9/2018F600

       

       

      I can get very close with data blending, however the issue I run into is that data blending only supports an pseudo 'inner-join'. As you can see, if either of these data sources is specified as primary then dates without purchases/revenue will cause rows in the secondary source to fall off.

       

       

      What is the best way to blend this data without causing records to fall off

        • 1. Re: How to accomplish star schema with multiple facts (without records falling off).
          Ankit Bansal

          Cory,

           

          There are 2 ways to achieve this:

          Method 1)

          Join your 2 dimension tables on (1=1) that will bring all the combination of product and date and then blend both of your facts with this. you might want to filter out records where both revenue and purchases are null in the end.

           

          Method 2)

          Union distinct values of product_id,Date from both of your fact tables.

          select product_id,date from revenue

          union

          select product_id,date from purchases

           

          then use this as primary data source and blend both the fact with this. In this method you need not apply null filter.

           

          Hope it helps.

          • 2. Re: How to accomplish star schema with multiple facts (without records falling off).
            Cory Weiner

            I will give option 2 a try.

             

            With a simple data model this should work, but as you can imagine with larger star models having many facts and dimensions that would be a messy solution.  It would be great if data blending allowed you to specify the join type to avoid this scenario.

             

            Or better yet, if there were no 'primary' data source and the queries were unioned.  This is how Oracle OBIEE accomplishes blending - there is no primary source and it is able to blend data in a similar way without losing records.

            • 3. Re: How to accomplish star schema with multiple facts (without records falling off).
              Ankit Bansal

              Agree with you. If data blending could allow full outer join, That would have been helpful in this case.

               

              May i know what is your data source ?

              Assuming if all your tables are in a single database then you can write a custom SQL(or create a database view) to achieve this:

               

              with combined_data as (

              select product_id,date,sum(revenue) as revenue , NULL as purchases from revenue

              group by product_id,date

              union all

              select product_id,date,NULL AS revenue, sum(purchases) as purchases  from purchases

              group by product_id,date)

               

              select product_id,date,sum(revenue) as revenue , sum(purchases) as purchases from combined_data

              group by product_id,date

               

              Then this data source you may join with your dimensions.