1 Reply Latest reply on Mar 14, 2018 3:54 PM by Patrick A Van Der Hyde

    Async/Parallel Remote Exec Requests (TabPy)

    Aaron Freeland

      I am working with TabPy and needing to crank through as many rows as possible...quickly.  Currently we are using a Docker image of TabPy and was going to incorporate that into our Kubernetes cluster so that we could scale out to handle the load.  I started testing the performance of Tableau's remote execution and it appeared that Tableau sends a row (however table calculation is configured), waits for a response, and then sends the next row...in a synchronous manner.  I haven't run across a scenario yet where any our row data builds on data from the row before it...so it would be nice if we had an option to allow asynchronous request.  It appears that this makes it impossible to get any good performance out or most of our datasets since we deal with larger volumes of data.  This issue is also frustrating because the performance hit will happen every time a filter is changed...and I dont believe there is any caching option available within Tableau.

       

      For more information I originally started the quest for async request on the TabPy GitHub issues: https://github.com/tableau/TabPy/issues/87

      Parallel/Async Requests · Issue #87 · tableau/TabPy · GitHub

       

      Perhaps I am missing something, or there is a configurable option...but was hoping someone out there may have a solution.  Would love to hear the technical reason why the remote execs are processed that way...

       

      Here is a quick excel document that took our quickest/average development response time in milliseconds and mapped them to seconds/minutes based on number of rows. These performance hits, I believe, would also occur again for ever time a filter was adjusted as well. Please let me know if I am just implementing incorrectly or remote exec generally operates this way.