3 Replies Latest reply on Aug 9, 2019 5:10 AM by Jonathan Drummey

    Data Integrity Checking

    Richard Finney

      Hi,

       

      I am looking for a tool to help discern if 2 schematically different but semantically identical databases are in data alignment according to a set of business rules.

      The databases are large and there is much potential for errors which need to be identified.

      I am currently exploring the possibility of using Tableau to help automate this, specifically I can use Tableau Prep to for example run a query across databases to spot any missing values across 2 tables by using a full outer join. I can also create a computed field to ensure that values match from a formatting perspective. So I am having some success.

       

      But my question is am I really using the correct tool, I like that the flows are visual and easier to maintain than just a suite of SQL scripts, but am I using the right part of Tableau? And is Tableau really built to address this kind of problem?

       

      Many thanks for any guidance, Richard.

        • 1. Re: Data Integrity Checking
          Jonathan Drummey

          Hi,

           

          In your description you haven't said anything about the number of tables, number of records, or number of business rules so I don't think I can give you a definite recommendation. Also I'm not clear on what you want the results to look like.

           

          Personally I'd only use Prep if the number of tables and rules was in the low single digits and I'd done some tests to make sure Prep could handle the joins...anything more than that and I'd be doing a lot of clicking & dragging to manually set things up, parse out the results, and generate output. If the number of tables/rules/joins is larger then I'd turn to an ETL tool or scripting so I could get more automation.

           

          Jonathan

          • 2. Re: Data Integrity Checking
            Richard Finney

            Thanks for your reply, I was hoping I could use Tableau as such an ETL tool, to be honest I am looking to recruit someone to develop the automation but in a catch 22 situation I need to find an appropriate technology to advertise for before I can seek expertise in it. We currently have some tests scripted in SQL but I was hoping for a more maintainable solution than just SQL scripts and quite liked the flow diagram nature of Tableau. We have a couple of hundred tables and half a million records in some of them so the scale of the problem really needs some kind of automation. Regards.

            • 3. Re: Data Integrity Checking
              Jonathan Drummey

              Hiring for technology you don’t have yet is tough!

               

              I also very much prefer flow-based ETL tools to scripts. I’ve been closely following Prep’s growth and evaluating new features vis-a-vis our use cases because I think it’s got the best ease of use of any ETL tool out there (which is really important to us), and it’s free with Tableau Creator licenses. Prep’s current design has been very much around the notion of enabling a conversation with the data in the data preparation stage, not so much the kinds of wide-ranging data automation tasks like your scenario. For that I can personally recommend Alteryx, though it can seem more expensive on a per-seat basis the ROI is tremendous. Alteryx is a little more old school in that the user is configuring a flow and then running a flow, however Alteryx also has a ton more of the “meta” features around building macros, managing & manipulating metadata, parameterizing inputs & outputs, running in-database vs. local, etc. that make it more appropriate for your scenario. I’ve also heard good things about Trifacta and Talent, and I’ve started to kick the tires on KNIME, it’s a free/open source tool that comes out well in rankings.

               

              Good luck!

               

              Jonathan