6 Replies Latest reply on Sep 15, 2016 7:31 PM by Toby Douglass

    a few questions about Tableau Server networking, threading and database internals

    Toby Douglass

      Hi.

       

      I have one or two questions regarding the internal implementation of certain Tableau functionality.

       

      1. Extracts; I understand extracts are duplicated on all nodes which are running a Data Engine.  Now, when a VizQL Server begins to create a view, and connects to a Data Engine to obtain data, and that data is in an extract, is the Data Engine using *multiple threads concurrently to service that one request*, or is it using a single thread to service that request?

       

      2. Extracts again - Tableau has an internal column-store database.  Extracts are duplicated across all node with a Data Engine.  Does this mean that when the internal column-store database is accessed, only the single node performing the access is involved?  i.e. unlike Vertica, where all nodes are involved, in Tableau, the column-store database is one-node only, like Vectorwise.

       

      3. With live data, when the Data Server (I think it's the Data Server, rather than the Data Engine) , is data transfer over the network from the remote database performed entirely over a single socket from a single Tableau node, or can it be that multiple Tableau nodes each open a socket to the remote database and parallelize the data transfer?  (the former I think is certainly the case, but I am just making absolutely sure).

       

      4. The VizQL server - this is multi-threaded, but when a request is issued to a VizQL server, will it use multiple threads concurrently to generate the view, or will it use one thread only to service the request?

       

      Thankyou in advance for any answers!

        • 1. Re: a few questions about Tableau Server networking, threading and database internals
          Jeff Strauss

          I'm tempted to take some educated guesses on this, but it's probably a better course of action to either dig into the logs and performance stats to see what is actually happening, or work through your account rep to engage Tableau engineering.  In terms of threading, some processes are multi-threaded.  Tableau Server Processes

          • 2. Re: a few questions about Tableau Server networking, threading and database internals
            Toby Douglass

            Thanks, Jeff.

             

            I've read through that page in the docs, and Googled quite a lot - either I'm no good at Googling, the questions I'm asking are hard to Google or they're not much asked.

            • 3. Re: a few questions about Tableau Server networking, threading and database internals
              Russell Christopher

              1. Don't know for sure, but since the Data Engine can/will use multiple cores to do some sorts of aggregation work, I'd assume the answer is "it can use multiple threads, but it depends on the question being asked".

               

              2. Yes. The data engine isn't MPP. Think of the data engine as more of an acceleration layer rather than as a database replacement. 

               

              3. The former.

               

              4. Single thread. Server is generally about increasing capacity (# of people who can consume) , not about trying to speed up rendering time by throwing more resources at the problem. A slow viz on Desktop will be a slow viz on Server.

               

              I don't see questions like this very often...What are you trying to figure out with this information? Just curious.

              1 of 1 people found this helpful
              • 4. Re: a few questions about Tableau Server networking, threading and database internals
                Toby Douglass

                Russell Christopher wrote:

                > 1. Don't know for sure, but since the Data Engine can/will use multiple cores to do some sorts of aggregation work, I'd assume the answer is

                > "it can use multiple threads, but it depends on the question being asked".

                 

                Interesting.  However, taking into account question #2, the database is limited to the cores in a single node, so it's not horizontally scalable over nodes, so it in the end doesn't scale.

                 

                > 2. Yes. The data engine isn't MPP. Think of the data engine as more of an acceleration layer rather than as a database replacement.

                 

                Right.

                 

                One other aspect of the database which I'm curious about - would I be right in saying an extract ends up being loaded to a single table?

                 

                > 3. The former.

                 

                Yeah.  I think there's a possibility here to allow better scaling.  Given careful use of SELECT, avoiding certain functionality which mandates collation on the remote database, it should be possible for each node in a remote database to emit its own results, and so then for a (say matching) node in Tableau to receive those results, i.e. network connectivity can begin to scale.

                 

                > 4. Single thread. Server is generally about increasing capacity (# of people who can consume) , not about trying to speed up rendering time by

                > throwing more resources at the problem. A slow viz on Desktop will be a slow viz on Server.

                 

                Yes.  This was my conclusion; Tableau scales in terms of number of users, but it does not scale in terms of the amount of data being used in a single view.

                 

                > I don't see questions like this very often...What are you trying to figure out with this information? Just curious.

                 

                We're using Tableau, but with big data (Vertica).  It's become clear this means we need to perform the heavy lifting in Vertica.  This is not possible using the Tableau Data Source GUI, because the SELECT functionality is limited (in particular, we want to use set theory - so we need EXCEPT/INTERSECT/UNION - we could do this in Tableau, but only *after* the Data Source, so we end up trying to load big sets into Tableau over a single socket and with I think a single thread on the Tableau side trying to process the data, i.e. it doesn't work).  It is possible with custom SQL, but originally we had a request to avoid using custom SQL, but this concern went away yesterday (I had thought it was a maintainability concerned, but it was actually about the SQL wrapper Tableau puts around the custom SQL query - it was thought it incurred a significant performance hit), so we're okay.


                My questions were about confirming what I thought was going on - if I'm going to make assertations to my colleagues and say this-and-this, and so we must do that-and-that, I need to be certain of the facts.

                 

                Thanks for your time and answers, Russell.

                • 5. Re: a few questions about Tableau Server networking, threading and database internals
                  Russell Christopher

                  2a. Yes. A single flat table.

                   

                  Yeah, you don't want to send tons of data across the wire (for obvious reasons), and furthermore you generally don't want Tableau to have to churn through it when you're dealing with "medium+" data.

                   

                  Custom SQL is generally not a good option, because we take your beautiful, performant query and make it part of a sub-select with all sorts of other stuff layered on it. Custom SQL also prevents us from "culling" joins to tables we don't need to answer a question, so we ask the same "big" question (fact table with 10 joined dimensions, for example) when all we really needed to do was hit a single dimension to get a list of values to show in a filter.

                   

                  Custom SQL is fine if you use it to "power" an extract which is as processed at night, however.  At worst, it'll run slower during your batch window..who cares.

                  • 6. Re: a few questions about Tableau Server networking, threading and database internals
                    Toby Douglass

                    > 2a. Yes. A single flat table.

                     

                    Ta.  That explains one or two things which were puzzling me.

                     

                    > Yeah, you don't want to send tons of data across the wire (for obvious reasons), and furthermore you generally

                    > don't want Tableau to have to churn through it when you're dealing with "medium+" data.

                     

                    Right.

                     

                    > Custom SQL is generally not a good option, because we take your beautiful, performant query and make it part

                    > of a sub-select with all sorts of other stuff layered on it.

                     

                    Hmm.  I've not noticed this to such an extent as you describe it; all I've seen so far is that the custom query was wrapped like so; "SELECT * FROM ( [my query] )".  It has no effect on performance - I'd expect it to be optimized away - but I have here omitted a WHERE clause, since I've seen this be WHERE(0=1) but I still receive results, so I've not looked at it enough yet to fully understand what queries it's issuing.

                     

                    Note that Vertica specifically claims the optimizer deals with this particular Tableau behaviour (in their working-with-Tableau PDF).  I may be wrong, but I think they know about this and they've specifically acted upon it.

                     

                    > Custom SQL also prevents us from "culling" joins to tables we don't need to answer a question, so we ask the

                    > same "big" question (fact table with 10 joined dimensions, for example) when all we really needed to do was hit

                    > a single dimension to get a list of values to show in a filter.

                     

                    Ya.  Not a big deal though I think, if we really need to we can just make another data source.