2 Replies Latest reply on Jun 28, 2016 10:59 AM by Partha Dutta

    Multithreading/Multiprocessing in Python Tableau SDK

    Partha Dutta

      I'm working on a program to work on creating TDE extracts from multiple CSV files. My goal is to read the files in parallel and then write to a single TDE. Primarily doing this to increase the performance of the overall process. I keep running into threading issues though. Even if I try to use locks around the critical pieces of code that write to the extract, I keep getting errors like the following:

       

       

      Exception in thread Thread-92:

      Traceback (most recent call last):

        File "/usr/lib64/python2.7/threading.py", line 813, in __bootstrap_inner

          self.run()

        File "mkextract.py", line 56, in run

          table.insert(row_arr[r])

        File "/usr/local/lib/python2.7/site-packages/tableausdk/Extract.py", line 523, in insert

          raise Exceptions.TableauException(result, Exceptions.GetLastErrorMessage())

      TableauException: TableauException (304): Invalid Table handle

       

      One of the threads actually succeeds writing to the TDE, but the other one fails. Any advice?

       

      Thanks, Partha

        • 1. Re: Multithreading/Multiprocessing in Python Tableau SDK
          Jeff D

          Hi Partha, multithreading is not supported in the SDK.  If performance is critical, I would use performance monitoring tools to identify the bottleneck.  If the bottleneck is not the SDK, then you could use multiple producer threads feeding into a single consumer thread (the thread for the Tableau SDK).  If the bottleneck is the SDK (the more likely scenario), then consider a hardware upgrade (e.g, faster CPU or higher performance disk subsystem, depending on what you see in the performance profiles).

          • 2. Re: Multithreading/Multiprocessing in Python Tableau SDK
            Partha Dutta

            Jeff,

             

            I have re-written my app to use the multiple producer/single consumer model. Still facing the bottlenecks. I've tried using a RAMDISK as well for the disk subsystem and still am not getting the performance I need. Is there no way to do a large batch of inserts with the SDK? Is it even supported by the tdeserver?