3 Replies Latest reply on Oct 10, 2018 5:30 AM by Michael Gillespie

    Tablue Prep/Hyper vs SQL Server (or other DB)


      I was under the impression that Prep ingests the data into a Tablue data extract in Hyper

      Hyper is meant to be:

      • Hyper is an in-memory data engine designed to add new fields to visualizations, apply filters faster, and deliver complex dashboards within Tableau.
      • The database leverages multi-core processing and uses parallelization techniques to optimize queries.
      • Tableau may be able to entice customers to leverage Tableau's Data Engine and move traditional database workloads.
      • Hyper has had a seven-month, pre-release program and nightly performance testing with 62,000 workbooks.
      • Customers won't have to migrate data to use Hyper in Tableau 10.5.


      How does Hyper index properly as the performace of the queries is really slow - If I use SQL Server directly to connect I will get more control over the schema to optimize more?


      If my queries are slow in the TDE - is there anything more I can do to optimze the queries?

        • 1. Re: Tablue Prep/Hyper vs SQL Server (or other DB)
          Michael Gillespie

          Well, sort of.


          Prep CAN output a .hyper format extract file if you want it to - it is not mandatory.


          TDE and Hyper are two completely different things.  If you're using Hyper you are by definition NOT using TDE.  Hyper replaces TDE.


          Whether or not your .hyper extract is slow is dependent on more things.  Is your source database schema optimized for Tableau?  Hyper will make many things much faster than a live connection to a database, but if the source structure is not optimized, then the .hyper extract won't be optimized either.


          There is no "index" capability that you can control or influence in Hyper.  The internal structure of a Hyper database is completely different from the source data (first of all, it's not a relational data store).  Hyper builds its internal representation of the source data to optimize it for analysis according to its own algorithms.  Again, if you present an un-optimzed schema to Hyper, you will see less-than-optimal results.


          Much depends on the raw volume of data as well.  And on the performance characteristics of the workstation you are running Prep/Tableau Desktop on.  And on other things too!


          So, when you say "performance of the queries is really slow" - what do you mean?

          1 of 1 people found this helpful
          • 2. Re: Tablue Prep/Hyper vs SQL Server (or other DB)

            With around 10million records and 5 filters, normal use case date selectors

            and several facts/products. Two visualisations are taking over 10 seconds

            each time a filter is changed. I wanted to see why is this so slow, the

            performance analyser is showing queries are taking a long time.


            What actually happens data wise and query wise when I update a filter. For

            example on an embedded Tablue Viz on a webpage, if i change the year then i

            am assuming a new query is run in the TDE or Hyper then the new

            recordset/values sent back to the client and then rendered. The rendering

            is slowed down by multiple filers (Tablue recommends as little filters as

            possible) but how does this impact speed, specfically in terms of data

            transfer between server and desktop. What woud be the bes 'brute force'

            method to make the pages faster with the filters we have if the underlying

            data model has been optimized?

            We are bringing data in with Tab Prep from CSV files- so I assumed

            indexes, and other tuning options that most databases can take advatage off

            would be able to be tweaked if verything is in Tableau - can you make a

            recommendation how you would do that?


            On Tue, Oct 9, 2018 at 10:04 PM Michael Gillespie <

            • 3. Re: Tablue Prep/Hyper vs SQL Server (or other DB)
              Michael Gillespie

              First things first: there is no "tweaking" possible when creating an extract from a CSV.  The Hyper engine does what it does and I don't know of a way to influence that.  Remember that Hyper is a non-relational columnar data store, so your indexing issues will be different from a relational database.


              We need to understand your architecture a bit more to be sure of the answers to your questions.  But since you mention embedded vizzes, we'll assume a standard Server infrastructure for the following.


              Assumptions: Data is in the form of an Extract, stored on Tableau Server.  Dashboards are being consumed in a web browser.  Data workflow is CSV-->Prep-->.hyper extract file, published to Tableau Server.


              In that scenario, some rendering is happening in the client web browser and some on the server, depending on the threshold value you set via a command-line configuration setting.  Read this article and decide if you want to adjust the threshold(s).


              About Client-Side Rendering


              When you change the value in a filter, Tableau has to recalculate the entire dashboard to reflect the new filter value.  If there are dependent filters on that same dashboard (Related values setting), then Tableau repopulates the domain of each of those filters, sequentially.  So, if there is one main date filter, and 2 other filters that change depending on the date selected, that is three round trips to the data source, and the 2 other filters require a complete read of every row in the data source to properly populate the filter values.  That has the potential to be slow if you have a large data set.


              I'm going to guess that the long-running queries are related to dependent filters on those dashboards.


              Normal resolutions to that kind of issue are to reduce the number of dependent filters, rearchitect the source data to eliminate some rows from the result set, add explicit higher-level aggregations (if applicable) to your source data so that Tableau does not have to perform calculations in memory.  You might also consider if you have any calculated fields in your workbook that would be better put into the data source (via Prep, perhaps) or into the CSV/root data source/database.

              1 of 1 people found this helpful