1 Reply Latest reply on Jan 6, 2018 1:49 PM by parul bansal

    Avoiding Rerunning same spark-sql query by caching

    Rishi Saraf

      Hello Everyone,

      We are evaluating using Tableau for reporting with  spark-sql connector. Till now results are good however one issue which is worrying is performance. We produce 30 billion records daily as parquet file through spark job. Once data is generated users want to see aggregated report on this data. There can be multiple users for these reports. When users look at the dashboard we don't want to run spark-sql query on the fly. Instead we want dashboard to load quickly.  Our tableau dashboard have 3 sheets(3 diff aggregation query) . Currently every-time we(multiple users) go to dashboard Tableau run same query again, which take good amount of time (on avg 4 mins). Is it possible to somehow cache this data in Tableau once a day? Idea is that once data is cached Tableau will not fire query on spark and hence dashboard will load quickly.



        • 1. Re: Avoiding Rerunning same spark-sql query by caching
          parul bansal

          1. use Cube rather than database directly

          2. Try switching back connection type from extract to live or from live to extract to see if there is any change

          3. try to extract first dashboard data source only and keep rest as it is if

          4. Use filtered data

          5. Use required columns.

          6. reduce number of calculations at the tableau end as you already getting aggregated data from cube itself.

          7. Look into your server machine configuration as well such as RAM etc.

          8. Reduce as many background jobs/service in your server


          Let me if any one of the above suggestions could help you out with performance.