1 2 Previous Next 16 Replies Latest reply on Feb 17, 2014 9:54 AM by Matt Lutton

    Performance difference between  published datasource dashboard compared to packaged workbook (twbx)

    Alvaro Echeverria

      Hi,

       

      Our previous configuration in Tableau was : each workbook with its own extract and it was published as a packaged workbook being refresh each day based in a schedule. Our problem was managing the calculations: to put a change was difficult because we had to replicate that in each dashboard.

       

      We decided to migrate everything to only one published data source with all the calculations and each dashboard is connected to this unique datasource. It is more issue to administrate but it created a huge problem: now our dashboards are extremely slow to show information:

       

      Measuring the performance of one of them, in the old situation (distributed extracts), the computation view was near to 10 seconds in average with a maximum of 15 seconds. Now, the new situation (unique datasource), in the first view it takes near 200 seconds in average with a maximum of 300 seconds. I know there is a cache associated to it, the second time that I open the same datasource, it opens in 10 seconds but every action that we do in the dashboard, if it is not cached, it takes around 200 seconds.

       

      Anyone know how to improve this times? I followed this tutorial: http://kb.tableausoftware.com/articles/knowledgebase/optimizing-tableau-server-performance but it didn't improve the performance.

       

      Any help will be really appreciate,

       

      Regards,

        • 1. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
          Matt Lutton

          I have also encountered this extremely frustrating issue.  The problem, as I see it, is that the published data source acts like and is a "live connection", rather than a connection to an extract.

           

          I'm not sure what to suggest, as I struggle with this as well and continue to update my workbooks manually because of the performance issues I've encountered with Published Data Sources.

           

          I've encountered several other issues with Published Data Sources, as well--one of the main limitations is that we cannot impersonate Server users when connecting to them.  So, it can be difficult to test user filters on Desktop using these data sources.

           

          Please do let us know if you figure anything else out, or if you find ways to make this work in your situation.

          • 2. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
            Russell Christopher

            I assume the published data source uses an extract, as well?  It almost sounds like you are using a live data source by in the published data source and don't realize it.

             

            In your shoes, I'd attempt to troublehshoot this in Tableau Desktop - Open both the "old" (packaged) and "new" (Tableau Data Source) workbook while recording performance.

             

            You'll be able to see which part of the rendering process is taking a long time (most likely the query) and will help narrow things down further.

             

            FYI, you can do the same thing on the Server itself with these instructions, but I personally find that working in Desktop allows me to narrow down the problem more quickly.

             

            If you *'really* want to go all out, you can look in the logs to see what's happening. For example, If I open and run a viz with a Data Server Data Source and execute a query which contains the field called "DFA_AD_Name", I'll see a reference to that field in the dataserver_timestamp.txt log found in: C:\ProgramData\Tableau\Tableau Server\data\tabsvc\vizqlserver\Logs\ (your location may vary)

             

            dataserver_9700_2014_02_17_10_45_27.txt:

             

            2014-02-17 10:47:44.053 (Default,russch,317BA91B62734C1384480D8B8271651F-0:0,UwIvH8Coyx0AAAyMi1gAAACR) 0900: Column: [DFA_Ad_Name] str(500,1) NOT NULL COLLATE en_US_CI -- cardinality=2720

            DFA

            <other stuff deleted>

             

            2014-02-17 10:47:44.053 (Default,russch,317BA91B62734C1384480D8B8271651F-0:0,UwIvH8Coyx0AAAyMi1gAAACR) 0900: <dataengine-connection class='dataengine' create-extracts-locally='true' dbname='folder_8088\upload_{53A8685D-10F8-4613-9D91-E55C1C431379}\DimAd _WWReach_.tde' extract-engine='true' management-tablespace='extracts' name='repoItem4185058414036549565' port='27042' schema='Extract' server='localhost' tablename='Extract' update-time='11/19/2013 3:59:58 PM'>

            2014-02-17 10:47:44.053 (Default,russch,317BA91B62734C1384480D8B8271651F-0:0,UwIvH8Coyx0AAAyMi1gAAACR) 0900: </dataengine-connection>

             

            Note how you can clearly see that the extract engine is being used? In fact, at the same time, if you look in the tdeserver log in C:\ProgramData\Tableau\Tableau Server\data\tabsvc\logs\dataengine\ (your location my vary) you can SEE the query being executed with the data engine:

             

            tdeserver0_2014_02_16_21_00_24.log:

             

            2014-02-17 10:47:44.022 (3644): Session542: QueryExecute:

            2014-02-17 10:47:44.022 (3644): (database "folder_8088\\upload_{53A8685D-10F8-4613-9D91-E55C1C431379}\\DimAd _WWReach_.tde")

            2014-02-17 10:47:44.022 (3644): Session542: QueryExecute: OK

            2014-02-17 10:47:44.022 (6260): tdeserver: open session, connection=::ffff:127.0.0.1:27042->::ffff:127.0.0.1:49398, sess_guid=543

            2014-02-17 10:47:44.022 (3648): Session542: QueryExecute:

            2014-02-17 10:47:44.022 (3648): (show schemas)

            2014-02-17 10:47:44.038 (3648): Elapsed time:0.002 sec

            2014-02-17 10:47:44.038 (3648): Compilation time:0.000 sec

            2014-02-17 10:47:44.038 (3648): Execution time:0.002 sec

            2014-02-17 10:47:44.038 (3648): Session542: QueryExecute: OK rowset_guid=1 n_columns=1

            2014-02-17 10:47:44.038 (3648): Session542: QueryExecute:

            2014-02-17 10:47:44.038 (3648): (describe [Extract].[Extract])

            2014-02-17 10:47:44.053 (3648): Elapsed time:0.019 sec

            2014-02-17 10:47:44.053 (3648): Compilation time:0.019 sec

            2014-02-17 10:47:44.053 (3648): Execution time:0.000 sec

            2014-02-17 10:47:44.053 (3648): Session542: QueryExecute: OK rowset_guid=2 n_columns=11

            2014-02-17 10:47:54.742 (3648): Session542: StatementPrepare: sess_guid=542 stmt_guid=3

            2014-02-17 10:47:54.742 (3648): (restrict (aggregate (project (table [Extract].[Extract]) (([none:DFA_Ad_Name:nk] [DFA_Ad_Name]))) (([none:DFA_Ad_Name:nk] [none:DFA_Ad_Name:nk])) ()) ([none:DFA_Ad_Name:nk]))

            2014-02-17 10:47:54.758 (3648): Session542: StatementPrepare: OK

            2014-02-17 10:47:54.758 (3644): Session542: StatementExecute: sess_guid=542 stmt_guid=3

            2014-02-17 10:47:54.773 (3644): Session542: StatementExecute: OK rowset_guid=4 n_columns=1

             

            By comparison, if you do the same thing with a data source which is "live" against SQL Server and look fora  field called "ModifiedDate", you see something different happen. Here's the DataServer log...note how we're connecting to SQL Server this time:

             

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: ACTION: Instance datasource from XML for 'repoItem7356938701866467050'.

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: <validate-connection>

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900:   <connection authentication='sqlserver' class='sqlserver' dbname='AdventureWorks2012' name='repoItem7356938701866467050' odbc-native-protocol='yes' server='simpleprime' username='sa' workgroup-auth-mode='prompt' />

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900:   <matches>

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900:     <connection authentication='sqlserver' class='sqlserver' dbname='AdventureWorks2012' name='repoItem7356938701866467050' odbc-native-protocol='yes' password='********' server='simpleprime' username='sa' workgroup-auth-mode='prompt' />

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900:   </matches>

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: </validate-connection>

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: ACTION: Datasource connection for 'repoItem7356938701866467050'.

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: workgroup-auth-mode: 'prompt'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: authentication: 'sqlserver'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: server: 'simpleprime'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: class: 'sqlserver'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: :id: 'LIVEAddressTypeSQLServer'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: username: 'sa'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: dbname: 'AdventureWorks2012'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: odbc-native-protocol: 'yes'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: :locale: '1033'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: name: 'repoItem7356938701866467050'

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: caption: ''

            2014-02-17 10:57:25.138 (Default,russch,D434A9D4BBD3416FA205F4A645BD0580-0:0,UwIxZcCoyx0AAAyMjJYAAAC5) 0900: password: '********'

             

             

            And now in a log called tabprotosrv.txt we SEE the query being executed:

             

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4: <QUERY protocol='015b0078'>

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4: SELECT DATEPART(year,[sqlproxy].[ModifiedDate]) AS [yr:ModifiedDate:ok],

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4:   SUM(CAST(CAST(1 as bigint) as BIGINT)) AS [sum:Number of Records:ok]

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4: FROM [Person].[AddressType] [sqlproxy]

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4: GROUP BY DATEPART(year,[sqlproxy].[ModifiedDate])

            2014-02-17 10:58:13.663 (-,-,-,-) 1ea4: </QUERY>

            • 3. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
              Matt Lutton

              All published data sources are "live", correct?  There is no way to make them perform like an Extract, even when they are published from an Extract, without making them a local extract to the workbook--right?

              • 4. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                Alvaro Echeverria

                Thanks Matthew and Russell for your replies. I'll try to record the performance in Tableau Desktop and try to get some insights based on the results.

                I understand that published data source are also extracts, at least the one that I published I have a schedule to update it in daily basis. I think because the workbook has to do a live connection to the published datasource that is slower than having that inside of the package as in a twbx file.

                • 5. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                  Russell Christopher

                  Matthew -

                   

                  If you are seeing this behavior, you should open up a support case. If a Data Source includes an extract, then the published Data Source will use the extract, not go "live".

                   

                  Just tested this on my box to be sure, and it works.

                  • 6. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                    Russell Christopher

                    No. If a published data source is "backed" by an extract, the extract will be leveraged.

                    • 7. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                      Matt Lutton

                      Any time I've published a data source as an extract, when I connect to the published data source, it behaves like a live connection.  I've reported this and was told this is expected behavior--was a change made in 8.1?

                      • 8. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                        Matt Lutton

                        http://tableau-ext.hosted.jivesoftware.com/thread/130273?tstart=0

                         

                        See this thread and comments from Robert Morton.  I have been avoiding Published Data Sources for this reason, so if they truly behave like Extracts, that would be awesome--but they never have worked that way in the past, and Desktop shows a live connection, "Extract Data" option is selectable in Desktop, etc.

                         

                        When testing, the workbooks published with localized extracts have always outperformed the published data source versions by a long shot (in both Desktop and Server)

                         

                        After re-reading the comments in that thread, I may need to do more testing on Tableau Server now that our Server has been upgraded.  So, even if performance is poor on Desktop, I may see a difference in Server?  That is what I'm gathering after re-reading the comments at the link above.

                         

                        I will do some more testing now that we have a more robust Server--some of my issues may have been a result of having a poor Server configuration.

                         

                        Another instance where some clear documentation would help.

                        • 9. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                          Russell Christopher

                          Nope, it's always worked this way. If published data sources could only be "live", they wouldn't be very useful, would they?

                           

                          Which comment of Robert's leads you to beleive this? I just read through the thread you pointed me to and I don't see any of what he wrote indicating that extracts associated with data sources would not be used...

                          • 10. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                            Matt Lutton

                            Confusing, for certain--perhaps I misunderstood what he was saying.  It doesn't really matter, as when testing, the performance of the published data sources was always insanely slow.  We've upgraded our Server, so perhaps I will see improvements now.

                             

                            My understanding was that their "usefulness" was in data modeling for other Desktop users, etc. 

                            • 11. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                              Russell Christopher

                              When using Data Server I've seen a ~1 second or less lag vs. a "embedded" extract - We're hooking in an extra "layer" of functionality, and needing to connect to data server, then have data server connect to the "real" source (extract or other).

                               

                              There were also some bugs that got squashed before 8.1 that could have impacted data server perf - but you shouldn't have seen such a big difference....

                              • 12. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                                Matt Lutton

                                I did see huge differences.  Sometimes, several minutes worth of waiting.  But our Server was very poor at the time.

                                 

                                I suppose the main comment of Robert's I was adhering to was:

                                "You may wish to keep with your prior approach of having many separate data sources with their own extracts, so long as..."

                                 

                                When testing, there was just no way we could live with the performance of the published data sources.  Again, I will do more testing but it will be a pain to revise all my workbooks at this point. Particularly since we cannot impersonate Server users for testing.

                                • 13. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                                  Jonathan Drummey

                                  Hi Matthew,

                                   

                                  You wrote:


                                  > See this thread and comments from Robert Morton.  I have been avoiding Published Data Sources for this reason, so if they truly behave like Extracts, that would be awesome--but they never have worked that way in the past, and Desktop shows a live connection, "Extract Data" option is selectable in Desktop, etc.

                                   

                                  I suspect you were a bit confused by how Tableau has implemented Published Data Sources. They appear as a live connection in Desktop because you have a live connection to the Published Data Source (that happens to be an extract). So the live part isn't the extract, it's that you have a live connection to the Tableau server. You can locally create an extract of any Published Data Source regardless of it's source (so far as I've seen). So yes, you can create an extract of an extract. Weee!

                                   

                                  My own (limited) testing of Published Data Sources matches Russell's experience. However, I haven't used them in production yet because of the following issues:

                                   

                                  - Varying results for functions. The showstopper for me was that as of 8.0 ATTR() wasn't implemented for Published Data Sources as of 8.0, I'd also found a few other bugs at that time with other functions that returned different results depending on whether they were in a Published Data Source or not. This made me concerned enough about Published Data Sources that I haven't done anything with them in 8.1.

                                  - Metadata management. I'm a big user of calculated fields and haven't figured out a workflow that works for me to manage updates to data sources and calculated fields.

                                  - Performance. I found that on our server, the slower performance of Published Data Sources was just enough to make them not so useful.

                                   

                                  Here's another recent thread on Published Data Source performance and data blends, where performance is upwards of 10x slower using Published Data Sources: Poor Performance using Tableau Server Published Data Sources using Data Blends.

                                   

                                  Jonathan

                                  • 14. Re: Performance difference between  published datasource dashboard compared to packaged workbook (twbx)
                                    Matt Lutton

                                    I was definitely confused, and being fairly new when I originally asked the question probably had an impact--but I believe the biggest factor was our extremely Poor Server at the time.

                                     

                                    However, I too have experienced all kinds of issues when trying to use published data sources -- re-building quick filters (for some reason, if you select every worksheet before publishing the data source to Server, the quick filters will magically stay in place and not require rebuilding when replacing the original extract with a Published Data Source--however, I have seen filters change from "Only Relevant Values" to "All Values in Database" and other odd changes I cannot explain after switching to a Published Data Source).  If these data sources would truly behave like the extracts they're created from, I'd love to use them and it would certainly make my work easier to manage.

                                     

                                    I do believe ATTR() is supported by published data sources now.

                                    1 2 Previous Next