2 Replies Latest reply on Feb 12, 2020 1:07 PM by Jonathan Drummey

    Tableau Prepjoins for big databases

    Li Jun Poh

      Hi All,

       

      I wanna ask a questions with regards to joins in Tableau Preps joins for databases which are drawn from a sample. Is the joins done applied to the whole database or is it just based on those which are sampled out from the databases? It seemed that I am unable to ensure that the databases are join completely because of this. Appreciate your advice.

       

      Thanks

       

      Li Jun

        • 1. Re: Tableau Prepjoins for big databases
          Rohan Malusare

          Hi Li,

           

          When you are joining any data source in tableau prep you will be getting below Join Summary for Validation, which will be helpful to understand weather you are joining works properly or not.

           

          Requesting you to go through this link The Join Step this might help you

           

           

           

          Regards,

          Rohan Malusare

          • 2. Re: Tableau Prepjoins for big databases
            Jonathan Drummey

            Hi,

             

            There are two modes for running Tableau Prep, interactive mode and run mode. When we click the run button (or are running flows from the command line or Tableau Prep Conductor) then the assumption Prep makes is that we want to generate output and there is no sampling done. In interactive mode (when editing a flow) then we get the sampling options on the Input step. It's also important to know that Prep will automatically sample for some tools, including the Join step, see Configure your Data Set - Tableau for details.

             

            When Prep is running the Input steps trigger downloads of data from the source system(s) into local Hyper files (stored in temp folders) and then Prep is working with those local files for downstream steps (such as Joins). In run mode that download will be all the data, when the Input tool is sampling then it's just the sampled data, and if sampling occurs in downstream tools then that's all the data that you'll seen then. So if you are trying to validate join results on really large data the only way to guarantee that at this time (as of Prep 2020.1) is to add a test Output tool and run the flow, then open up the results of the test output to validate.

             

            https://community.tableau.com/ideas/10257 is a feature request to turn off all sampling so we can work with the data as is, please vote it up if you'd like.

             

            Jonathan