4 Replies Latest reply on Sep 18, 2015 9:13 AM by Guillaume G

    Billions of rows and big performance issue when integrating dimensions tables...

    Guillaume G



      I'm using Tableau Software with Spark SQL, my use case is very simple :


      - A big fact table (11 billion of rows) with following fields : id_dim1, id_dim2, id_dim3, value. The table is partitionned on id_dim1.

      - Three dimensions tables with an id field and a label field


      If I play only with the fact table, I can filter along id_dim1, id_dim2, id_dim3 with very satisfying performances. But, I want my labels, so I made a join in tableau software between my fact table and my dimensions table : good news is that I have my labels, but bad news is that Tablau Software perform systematically a join between fact table and dimensions table. With such an amount of data, it leads to ugly responses time.


      A possible solution would be to insert labels in my fact table to avoid joins but 1) it's not very nice and 2) where clause will be less efficient with varchar than int. So it's a no go.


      I discovered that we have the possibility to manually create an alias for each item of a dimension, but we have to do it element by element : is that a joke ?? Not even a copy from clipboard is allowed, or an import from an excel file ? It would be a solution for my scenario, by creating an alias for each item in id_dim1, id_dim2, id_dim3.


      Am I missing something ? By this time, Tableau Software is not a viable solution for my users...