7 Replies Latest reply on Nov 9, 2018 9:12 AM by Daniel Archuleta

    Tableau Prep Sampling

    Misaki Nozawa

      I have a flow that I would like to use to run ALL my data and not just a sampled portion of it. I have my sampling settings at the inputs as 'ALL DATA'; however, down the path, the orange 'Sampled' badge still appears. Does this mean that the data is being sampled? Or is this just a feature to increase performance while building the flow but when it comes to outputting a .hyper extract, ALL the data will be processed?

        • 1. Re: Tableau Prep Sampling
          Ken Flerlage

          Regardless of your sampling settings, a Prep workflow will always output all of your data (unless you create filters, of course). Sampling simply reduces the data loaded while interactively building your flow. This is done so that the Tableau Prep desktop software doesn't have to extract and present all your data while you're building because that would create performance issues. For more, see Understanding and adjusting data sampling in Tableau Prep | Tableau Software

          • 2. Re: Tableau Prep Sampling
            Daniel Archuleta

            This is also happening to me. In the input I selected "Use all data". However, after pivoting the data, the orange "Sampled" badge appears automatically beside the row count and remains in subsequent steps. Unfortunately, there are anomalies that require all of the data to fully clean. I am unable to find a way to disable the sampling. What is the solution?

            Capture.PNG

            Capture2.PNG

            • 3. Re: Tableau Prep Sampling
              Ken Flerlage

              Did you see my note above? It will always process all data.

              • 4. Re: Tableau Prep Sampling
                Daniel Archuleta

                Yes. I understand that it processes all data. However, as far as I have read and can see in the sample, not all values are included for all data elements. In order to fully clean the data, I need to see all values. Is my understanding in error, does the sample include all values for all data elements? If my understanding is correct, then how do I disable the auto-sampling, how do I get back to all data?

                • 5. Re: Tableau Prep Sampling
                  Ken Flerlage

                  Ah, I see what you're saying. I'd think that changing to Use all data would address that, so I don't have a good answer unfortunately. We'll have to leave this open to see if anyone else knows. You may also consider reaching out to Tableau Support.

                  • 6. Re: Tableau Prep Sampling
                    Daniel Archuleta

                    Thanks Ken. I have reached out to support. They said that auto-sampling after pivoting is an expected function, but after explaining why I need to see all values to have 100% confidence that all data are clean, they escalated it. I will reply on this thread with their solution. Thanks!

                    • 7. Re: Tableau Prep Sampling
                      Daniel Archuleta

                      From Tableau Support:

                      "Since we're using text files as our data source, this pivot is hitting the limit of what the data source itself can handle. This would also explain why creating an extract first works fine. Since Tableau's extracts are designed to handle massive amounts of data, the pivot calculation doesn't run into any limit that the extract has. The following article goes over the limitations of jet-based data sources, which includes text files, Excel, and Access: Limitations to Data and File Sizes with Jet-based Data Sources

                       

                      Since this is a limitation with the data source itself, our best solution would be to convert to the extract before pivoting."

                       

                      Unfortunately, the article text wasn't hyperlinked. I had to run a two-step prep process using two prep files. First, I pivoted my data and outputted it as a hyper. Second, I connected to it in a new prep file, selected "Use all data" and started cleaning it. There is significant lag time with every edit, but that is because a 50,000 record flat file became a 7,000,000 record normal file after pivoting. The tech support agent explained that the lag was due to system limitations. I need to upgrade from 16 GB RAM to 32 RAM.

                       

                      I hope this helps.