1 2 Previous Next 20 Replies Latest reply on May 30, 2018 12:07 PM by Deepak Siroya

    Compare two data-sets for changes

    Philip Hoyos

      I have a table with multiple columns and rows, lets call it A, which was created at time 130717. I need to compare table A, to the exact same table, lets call it B, which has been updated with new information in the rows, at time 130108.

       

      The comparison needs to be able to

      • Count the number of rows which has been changed
      • Identify the changed rows, and show original row, current row and the change
      • Identify the changed fields, and show original row, current and row the change ie. in field Price

       

      Example

       

      Table A

      IDTimePriceReoccurenceSupplierMean timeAvg hrs
      HOS-1-LS130717£4,562LS0,720,72

       

      Table B

      IDTimePriceReoccurenceSupplierMean timeAvg hrs
      HOS-1-LS130108£5,162LS0,720,72

       

      I have tried using the lookup function, but so far I have had no luck. Can anyone help me? All help very much appreciated.

      Thank you for your time.

      /Phil

       

      Message was edited by: Philip Hoyos Changed language errors

        • 1. Re: Compare two data-sets for changes
          kettan

          Welcome to the forum!

           

          WITHIN TABLEAU

          This seems best solved outside Tableau or with a Custom SQL although I am curious to see if some table calculation or data blending expert has a good solution inside Tableau. I can picture this is possible within Tableau with 2x sheets using data blending on ID and a calculated field comparing the columns, but don't have time right now to check it out.

           

          UNION ALL

          One way that might work is to make a union of Tableau A and B and thereafter a simple table with all columns and a filter on Number of Records = 1.

           

          FULL OUTER JOIN

          What's your data source? I ask, because Jet SQL (Excel, Text File, Access) doesn't support FULL OUTER JOINS which would make it possible to this with 1x FULL JOIN query. If Jet SQL, it can be done with a UNION ALL of 2x LEFT JOIN queries.

           

          CROSS JOIN

          I am also curious to see how successfully this can be done with tip 3 (unpivot columns) in  The Cross Join Collection  and therefore something I might try as soon as I have some free time either later today or another day.

           

          Ps. I like your question.

          1 of 1 people found this helpful
          • 2. Re: Compare two data-sets for changes
            Philip Hoyos

            Hi Kettan

             

            Thank you very much for your welcome, and sharing insights!

            My data resides in excel sheets, and unfortunately not in any DB. Thank you for your suggestions, I will see if I can employe one of them, unless someone has a solution on how to work the solution.

             

            Thank you.

            • 3. Re: Compare two data-sets for changes
              kettan

              Here is a UNION ALL solution where 2 (Number of Records) means that everything is equal and 1 that something is different. If 1 row without subrow then it is a row (ID) only found in one table. If 1 with subrow, then 1 or more columns are different. Hovering [Number of Rows] shows table name (when 1). One thing to notice is that I changed Analysis > Table Layout > Advanced from 6 to 16 rows/columns. See more in screenshot and attached files.

               

              Ps. I will share a CROSS JOIN solution later. and possibly also some of the other suggestions in my first comment.

               

              thread 135802 Compare two data-sets for changes.png

              1 of 1 people found this helpful
              • 4. Re: Compare two data-sets for changes
                kettan

                I have a feeling (and hope) there will come suggestions from several helpers, because this question (sought solution) is attractive in itself.

                • 5. Re: Compare two data-sets for changes
                  Philip Hoyos

                  This is definitly very helpful!

                   

                  Say that you have a dataset of 10K rows - a table will not be the best way of displaying. So Im thinking that It would be best to display

                  • Total row count of changes
                  • Total row count of changes per supplier
                  • Changes pr column
                  • Total impact of changes - for columns where it makes sense - e.g. with price

                   

                  What is your view on this?

                  Hope you don't mind me asking

                  • 6. Re: Compare two data-sets for changes
                    kettan

                    I am glad you are asking 

                     

                    A common issue on this forum is that questioners provide too little information making it almost impossible to help without asking for more info. The opposite happens also, that no one touches a question because it has too much information.

                     

                    I will take a closer look at this later ... hopefully this evening.

                    • 7. Re: Compare two data-sets for changes
                      kettan

                      The attached workbook contains all the SQL techniques I was thinking to share. Hopefully some of them is good enough to earn a correct answer. I have made a reference to this thread under Tip 3 in  The Cross Join Collection.

                       

                       

                      FULL JOIN SQL

                       

                      SELECT
                        a.[Avg hrs]     AS [a Avg hrs],
                        a.[ID]          AS [a ID],
                        a.[Mean time]   AS [a Mean time],
                        a.[Price]       AS [a Price],
                        a.[Reoccurence] AS [a Reoccurence],
                        a.[Supplier]    AS [a Supplier],
                        a.[Time]        AS [a Time],
                        b.[Avg hrs]     AS [b Avg hrs],
                        b.[ID]          AS [b ID],
                        b.[Mean time]   AS [b Mean time],
                        b.[Price]       AS [b Price],
                        b.[Reoccurence] AS [b Reoccurence],
                        b.[Supplier]    AS [b Supplier],
                        b.[Time]        AS [b Time]
                      FROM ['table A$'] a
                      LEFT JOIN ['table B$'] b ON a.ID = b.ID
                      UNION ALL
                      SELECT
                        a.[Avg hrs]     AS [a Avg hrs],
                        a.[ID]          AS [a ID],
                        a.[Mean time]   AS [a Mean time],
                        a.[Price]       AS [a Price],
                        a.[Reoccurence] AS [a Reoccurence],
                        a.[Supplier]    AS [a Supplier],
                        a.[Time]        AS [a Time],
                        b.[Avg hrs]     AS [b Avg hrs],
                        b.[ID]          AS [b ID],
                        b.[Mean time]   AS [b Mean time],
                        b.[Price]       AS [b Price],
                        b.[Reoccurence] AS [b Reoccurence],
                        b.[Supplier]    AS [b Supplier],
                        b.[Time]        AS [b Time]
                      FROM ['table B$'] b
                      LEFT JOIN ['table A$'] a ON b.ID = a.ID
                      WHERE a.ID is null
                      
                      
                      

                      .

                       

                      CROSS JOIN SQL

                       

                      SELECT a.[ID]
                      , l.[Column Name]
                      , CStr(
                        IIF(l.[Column Name] = 'Time',        a.[Time],
                        IIF(l.[Column Name] = 'Price',       a.[Price],
                        IIF(l.[Column Name] = 'Reoccurence', a.[Reoccurence],
                        IIF(l.[Column Name] = 'Supplier',    a.[Supplier],
                        IIF(l.[Column Name] = 'Mean time',   a.[Mean time],
                        IIF(l.[Column Name] = 'Avg hrs',     a.[Avg hrs]
                        ))))))) AS [a Value]
                      , Cstr(
                        IIF(l.[Column Name] = 'Time',        b.[Time],
                        IIF(l.[Column Name] = 'Price',       b.[Price],
                        IIF(l.[Column Name] = 'Reoccurence', b.[Reoccurence],
                        IIF(l.[Column Name] = 'Supplier',    b.[Supplier],
                        IIF(l.[Column Name] = 'Mean time',   b.[Mean time],
                        IIF(l.[Column Name] = 'Avg hrs',     b.[Avg hrs]
                        ))))))) AS [b Value]
                      FROM (['table A$'] a
                      INNER JOIN ['table B$'] b ON a.ID = b.ID), [lookup$] l
                      
                      
                      

                      .

                       

                      UNION ALL SQL

                       

                      SELECT 'Table A' AS [Table Name],
                        ['table A$'].[Avg hrs],
                        ['table A$'].[ID],
                        ['table A$'].[Mean time],
                        ['table A$'].[Price],
                        ['table A$'].[Reoccurence],
                        ['table A$'].[Supplier],
                        ['table A$'].[Time]
                      FROM ['table A$']
                      UNION ALL
                      SELECT 'Table B' AS [Table Name],
                        ['table B$'].[Avg hrs],
                        ['table B$'].[ID],
                        ['table B$'].[Mean time],
                        ['table B$'].[Price],
                        ['table B$'].[Reoccurence],
                        ['table B$'].[Supplier],
                        ['table B$'].[Time]
                      FROM ['table B$']
                      
                      
                      

                      .

                      • 8. Re: Compare two data-sets for changes
                        Philip Hoyos

                        Thank you Kettan - you have been very helpful!

                         

                        The "Cross Join SQL" is definitely the best solution for the kind of comparison that Im trying to do. I do, however, have a few problems getting it to work with my column names, even though they work with yours.

                         

                        I have been trying too workout how your code was written, but cannot find any examples to help me understand the code. Is it possible that you could explain it? Im trying to understand the CSTR function, as well as "[lookup$] l" at the end of line 20. When I try to replicate the code, using my own column names, I get this error:

                         

                        "Database error 0x80040E10: No value given for one or more required parameters."

                        or, this

                        "Database error 0x80040E14: Syntax error in FROM clause."

                        or this

                        "Database error 0x80040E14: Syntax error in query expression 'CStr("

                         

                        I look up the code, and found this Troubleshooting 80040e10 errors - adOpenStatic.com - which was very helpful, and would be even more so, If I have a better understanding of your code . I thought it might have something to do with illigal characters, but when I try to remove any characters that is not in the alphabet I still get an error, I thought it might be because my column names were to long - but I cant find info on that either.

                         

                        If you have the time - I would be very greatful you could explain your code

                         

                        Thank you for your time and help.

                        Best regards

                        Phil

                        • 9. Re: Compare two data-sets for changes
                          kettan

                          DEBUGGING CODE

                          It is difficult to correct code without seeing it. If you could share some dummy data and the code that doesn't work, it will probably be quick for me to correct it. The best would be a packaged workbook, because it includes both data (Excel) and code.

                           

                          EXPLANATIONS

                          CStr changes various data types in the IIF to one common data type, namely text. This is because a column can only have one data type and text is the only one all (text & dates & numbers) have in common.

                           

                          [lookup$] is a table (a separate sheet in the same Excel workbook) with all column names (except primary key ID) and is what is used to create one row for each column. See more about this in tip 3 in The Cross Join Collection.  I would gladly explain this more fully, but have not enough free time to do that now.

                          • 10. Re: Compare two data-sets for changes
                            Philip Hoyos

                            Thank you very much! That would be very helpful. I will make some dummy data, so that you can see my problem

                            Thanks a lot for the explaination too.

                            • 11. Re: Compare two data-sets for changes
                              Philip Hoyos

                              kettan

                              I've just been through your work again and I'm very very thankful for your help! It's amazing! Thank you so much! I ended up using the FULL JOIN SQL as I could not get the CROSS JOIN SQL to work in my SQL. Is there any way to add a lookup table to the full join, as with the cross join?

                               

                              Thank you again for your time and help! Its been absolut great!

                              • 12. Re: Re: Compare two data-sets for changes
                                kettan

                                Hi Philip,

                                 

                                I just clicked on a reference to Stack Overflow in  Re: Helper Evolution.... and saw a link to following question at the top:  sql server - Compare multiple columns in new table with only three columns

                                 

                                I have modified the cross join query so it works in SQL Server. (SQL Server is my prime SQL tool, because I use it for a data warehouse I am developing. Jet SQL is something I recently learned myself as a necessity to help with data issues on the Tableau forum.)

                                 

                                Hopefully this is a query that works in your work situation:

                                .

                                SELECT org.[ID]
                                , l.[Column Name]
                                , CASE l.[Column Name]
                                  WHEN 'Pos'            THEN CAST(org.[Pos] AS VARCHAR(11))
                                  WHEN 'Remarks'        THEN org.[Remarks]
                                  WHEN 'Candidate name' THEN org.[Candidate name]
                                  END AS [Q2 Value]
                                , CASE l.[Column Name]
                                  WHEN 'Pos'            THEN CAST(new.[Pos] AS VARCHAR(11))
                                  WHEN 'Remarks'        THEN new.[Remarks]
                                  WHEN 'Candidate name' THEN new.[Candidate name]
                                  END AS [Q4 Value]
                                FROM [DataOutput_org] org
                                INNER JOIN [DataOutput_new] new ON org.ID = new.ID
                                CROSS JOIN (
                                  SELECT 'Pos' AS [Column Name] UNION ALL
                                  SELECT 'Remarks' UNION ALL
                                  SELECT 'Candidate name'
                                  ) l
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                
                                

                                .

                                The CAST([Pos] AS VARCHAR(11)) is done because I defined [Pos] as INT in SQL Server. You can the same write VARCHAR(100) and thus be on the safe side

                                 

                                I used a table-less lookup table because it was easier to document its content in this way. You can of course change it to a permanent table to shorten the number of code lines and perhaps also slightly better performance.

                                1 of 1 people found this helpful
                                • 13. Re: Compare two data-sets for changes
                                  Philip Hoyos

                                  Hi kettan

                                   

                                  Thank you so much for your help. I tried Stackoverflow - as you say the answer is a SQL problem and I thought it would be better to ask there then. Your solution is perfect! It was just what I was looking for. Thank you so much for your explanation, it makes it so much easier to understand the answer and be able to apply the new knowledge elsewhere

                                  • 14. Re: Compare two data-sets for changes
                                    kettan

                                    You are welcome. It was a pleasure to solve this attractive task. There might come a time when I need the solution myself. Without your quality question I would not be ready. Now I am 

                                     

                                    Thank you so much for your explanation, it makes it so much easier to understand the answer and be able to apply the new knowledge elsewhere

                                    As for using cross joins in different scenarios, I recommend reading  The Cross Join Collection.

                                    1 2 Previous Next