4 Replies Latest reply on Feb 4, 2014 10:31 PM by Cristian Vasile

    Blending and Performance

    Marco Kundert

      Hi all,

       

      we have the following situation, after trying a lot and read all docs and forums your help is very welcome

       

      Data Set in Oracle Database in two tables

      Contracts: 10 mio Rows

      Deliveries: 1900 mio Rows

       

      Current Solution

      At the moment we do a join to generate a flat extract, every row contains all information from contracts (a lot of strings) and deliveries (a lot of floats). For sure we have also a where, which reduce the 1900mio rows to around 100mio. It takes around 2h to generate.

      Workbooks are quite fast and ok (after some performance tuning).

       

      New solution?: Blending

      We are thinking about making 1 extract for only Contracts and different extracts for deliveries, each delivery extract contains between 1 and 100mio rows, depending on the Use Case.

      The join for blending is based on a date (700 different dates in this 10mio contracts and 10000 different contractIDs in this 10mio contracts, both together are the Unique Key in Contracts). All extracts are on Tableau Server 8 with enough Ram (42GB).

      The Contract extract contains mostly the dimensions and the deliveries the messures.

      The extracts are less than 1GB in size.

       

      The problem

      Regardless if we use one or the other as primary source, even the simplest workbook takes endless (no result at all).
      So, is this just too much data for blending? Or is blending the wrong thing anyway for this use case?

      We use filters on the Contract fields and (a lot) aggregations and table calculations on the Deliveries fields.

        • 1. Re: Blending and Performance
          Cristian Vasile

          Marco,

           

          The blending capability, in general, should be used if a customer operates two different databases like MS SQL and Oracle.

           

          A nice white paper on blending features could be found here, please read it.

          Data Blending: Dynamic Workload Driven Data Integration in Tableau

           

          You could try a few things:

          o refine the join sql phrase

          o buy a PCI based SSD drive and use it on Oracle server to "accelerate" the database

          o create a view on Oracle and download data from that view

          o re-tune the Oracle database (increase SGA buffers, verify effectiveness of indexes, check temporary tablespace, redo logs, analyze storage architecture)

          o On the machine with 42Gbytes of RAM, buy an enterprise ram disk software, allocate 8..10Gbytes of ram to that disk and store your extracts there.

          o buy a SSD drive and use it on Tableau server to reduce dramatically the IO access time to extracts

           

          The ram solution was implemented by Allan Walker (on a 16Gbytes desktop class machine) and you could read how things have been improved here:

          Re: Hardware for the best Tableau performance

           

          Regards,

          Cristian.

          • 2. Re: Blending and Performance
            Marco Kundert

            Thanks for your tips.

             

            The question is, if this statement is true:

            "The blending capability, should be used if a customer operates two different databases like MS SQL and Oracle."

            Our intention was, not multiply all the string fields by ten, thus have less data (master + secondary extract < one flat joined extract). And to re-use the master extract to blend with other secondary extract.

            But if you are right, data blending is not meant to use like that....

             

            Here some thougth to your tips:

            o refine the join sql phrase

            This is not possible, because of the data structure

            o buy a PCI based SSD drive and use it on Oracle server to "accelerate" the database

            We do have this.

            o create a view on Oracle and download data from that view

            Don't think this has an impact (View vs. same SQL Query = same result)

            o On the machine with 42Gbytes of RAM, buy an enterprise ram disk software, allocate 8..10Gbytes of ram to that disk and store your extracts ther

            As far as I understand Tableau Server, the used extracts are in the RAM anyway all the time, so it only helps if your are using Tableau Desktop and only for the first opening.

            • 3. Re: Blending and Performance
              Cristian Vasile

              Marco,

               

              I wouldn't discount the RAM disk approach just because your assumption is that entire extract is loaded in memory by Tableau. I don't put too much money on this bet.

              If one customer's server do have 8Gbytes of RAM and the size of the extract is 20Gbytes, then should the application stop working?

              Also a RAM disk is incredible fast, on both fundamental operations,  read and write,  no matter what application is initiating the IO operation.

               

              Tableau have some partners one is Alteryx, who has developed a nice graphical product (drag and drop widgets on screen) able to join and blend data and save the final result as a Tableau extract.

               

              Alteryx & Tableau | Data Blending and Advanced Analytics | Alteryx

              Data Blending - 2 Minute Demo from Alteryx - YouTube

              Maybe Alteryx product could bring you more speed to the ETL process and shrink the out of business window.

               

              I Hope that this is going to help you.

               

              Regards,

              Cristian.

              • 4. Re: Blending and Performance
                Cristian Vasile

                Marco,

                 

                I was thinking about some out-of-the-box solutions for you, let me expose my ideas below.

                 

                a. Move your corporate data in a real cloud warehouse, like the one provided by 1010data (www.1010data.com) they are able to ensure Tableau connectivity.

                 

                b. Try a hybrid storage approach for Tableau server,  ssd drive coupled  with large cache buffer in RAM, such solution is provided by Velobit, (www.velobit.com) who was aquired by Western Digital last year.

                 

                Regards,

                Cristian.