2 Replies Latest reply on Oct 25, 2016 10:15 AM by John Sobczak

    Is it better to connect Tableau to serialized or parallelized data?

    Brad Davis

      Hello Tableau'ians,

       

      I have a question about the best way to design a data source, whether I should have my data source report the data in a serial or parallel format. Some of my goals for this data is to be able to display total count and percentage counts split by a variety of different factors.  Here are two example data structures I am considering.

       

       

      Seriallized

        

      datetyperesultcount
      1/1/2016catsTRUE24
      1/1/2016catsFALSE35
      1/1/2016forkliftsTRUE37
      1/1/2016forkliftsFALSE44
      1/1/2016turnipsTRUE37
      1/1/2016turnipsFALSE15
      1/2/2016catsTRUE27
      1/2/2016catsFALSE37
      1/2/2016forkliftsTRUE21
      1/2/2016forkliftsFALSE18
      1/2/2016turnipsFALSE41
      1/2/2016turnipsTRUE19
      1/6/2016catsTRUE49
      1/6/2016forkliftsFALSE47
      1/6/2016forkliftsTRUE32
      1/6/2016turnipsFALSE15
      1/6/2016turnipsTRUE18
      1/6/2016catsFALSE45

       

      Parallel

       

        

      datecat truecat falseforklift trueforklift falseturnip trueturnip false
      1/1/2016482833314224
      1/2/2016123622502438
      1/6/2016484523364235

       

      I can create all the kinds of figures that I want to create with the data in either format (obviously), but I'm not sure if one is generally preferable over the other.

       

      As I see it, the advantages of the 'parallel' over the 'serial' format is that it reduces the amount of calculations that tableau has to do itself by leaving that to the database to calculate for it, although this creates an additional burden on the database.  But that difference seems rather trivial.  An advantage of the 'serial' versus the 'parallel' is that if we make any errors in the ETL (for the parallel format) it will take a bit longer to recreate the affected tables, but once it has been fixed it doesn't require any extra changes in Tableau- the newly fixed data will just show up (if we use a live connection or a daily full refresh from tableau online), but if we use 'serial' we would have to go in and fix it in Tableau.  Personally, I find Tableau's interface for making calculated fields more cumbersome than just doing it in SQL, and my guess is that anyone who can fix any errors in Tableau's calculated fields could also fix any SQL, so the two options are basically equally maintainable.

       

      So on the whole I think the 'parallel' structure works better for me because it keeps the majority of the work where I'd prefer to keep it (in sql).  Can anyone think of any other factors I should consider?


      Thanks,

       

       

      Brad