3 Replies Latest reply on Jun 1, 2016 6:59 AM by Alastair Young

    Performance Issue With Large Data Set

    John Bloch

      Good morning,


      I have been trying to tackle this issue the last couple of weeks and wanted to post to see if anyone had similar issues or better yet, a solution.


      A little background on the situation: I am trying to do a join between two tables that live on our company's server and building a dashboard in Tableau desktop. The issue I have run into is that since I am dealing with millions of records, whenever I try to filter on anything, it takes quite some time to execute the query. I have also tried to do data extract to the server but always get an error.


      The end goal is to publish the dashboard on our Tableau server for business owners to use and filter on whatever view they desire without having to wait several minutes for one filter to load.


      I have read some things about doing a data extract with a smaller number of records, publish to the server, and then after the dashboard is on the Tableau server, refresh the data extract without putting a limit on the number of records. Is this the best way to go about the issue or does anyone have any ideas?




        • 1. Re: Performance Issue With Large Data Set
          Alastair Young



          If at all possible it would be worth doing the join on the Database server by creating a view and allowing the database to take the load.


          The method you have suggested is certainly possible, by creating a filter and applying this to the dataset


          If NOW() > DATETIME("June 1, 2016 17:01:08") then 'true' else 'false' END


          This would create a blank dataset, If you publish this to server and run a full extract refresh after the designated date and time then all records would be returned.



          • 2. Re: Performance Issue With Large Data Set
            John Bloch

            Thanks Alastair for responding and think I am following.

            So let's say I create a blank dataset to publish to the server. I will then need to refresh pulling back all data (dating back to 7/1/15). Will the best way to do this is to set a limit to the number of records in the initial extract to publish and then after it is published, refresh the data, pulling back all the data I am looking at bringing in? Or do what you said by creating a blank extract then just change the criteria when doing the refresh on the server?



            • 3. Re: Performance Issue With Large Data Set
              Alastair Young

              By creating the filter,

              None of your records will ever be returned (until your system clock hits the "June 1, 2016 15:00:00")  It is at this point, the filter returns a True, and all the data will be visible.  Your essentially publishing an empty extract container.


              You obviously must make sure that you give yourself enough time to get the filter applied to the dataset, and published onto the server.  I would also advise to check the server time as this is what is going to determine the NOW() time.


              The best way to apply the filter to the pause updates or auto updates, apply the dataset filter by selecting the calculated field, In the General Tab check the 'Use All', now on the Wildcard tab, add the word true, (whichever is in the speechmarks of you calc), check the Exactly Matches, and the Include All values when empty selection too.