6 Replies Latest reply on May 10, 2016 10:17 AM by Łukasz Majewski

    Is NULL contributing to extract size?

    Łukasz Majewski

      Hi,

       

      Three questions:

       

      1. Say you have a varchar(50) column in your data source and set 50% of the rows to NULL (while initially all contained text). Should that reduce the extract size?

      2. Should removing a column from a select list reduce extract size?

      3. Should removing a (deterministic) calculated field reduce extract size?

       

       

      Here is why I am asking. I created an extract which is around 20mln rows in 61 columns + 1 calculated dimension (text). After publishing to server (9.1) and refreshing its size is 877MB. I then created another one without the calculated field and one other column - so it is 60 columns now. I also replaced most common values in a few fields with NULL - e.g. I have [Country] field and in 5mln records it is 'United States' so I nullified those.

      I expected a significant drop in the new extract size but it is 877MB too. I recall I did similar exercise about a year ago with Tableau 8.3 and I successfully reduced the extract size then. Has something changed in tde design since then? Does actual data source type matter in tde storage? Last year I was working with an extract out of MySQL connection and this time it is Oracle.

       

       

      Regards,

      Łukasz

        • 1. Re: Is NULL contributing to extract size?
          Łukasz Majewski

          I cannot find any relevant information with regards to the above concerns. Can someone point me to the right resources?

          • 2. Re: Is NULL contributing to extract size?
            Paul Sherras

            Hi Lucasz,

             

            Edit - My coworker has just informed me that if NULL values are used within a fixed width text field then it will use up the same space as any other value that would be entered into that field.

             

            With regards to your questions specifically about Tableau, can you just check for me that you have specified "Use Extract"? Also, it's worth mentioning, 5 million cells when you are working with 1,200,000,000 cells (20mil rows * 60 columns) is only around 0.4% of your data so don't expect a huge decrease in size

             

            Kind regards,

            • 3. Re: Is NULL contributing to extract size?
              Łukasz Majewski

              Hi Paul - thanks for your comments!

               

              I believe all the text fields are variable length strings, not fixed. But maybe tde converts them to fixed?

              Yes I used extract which I published and refreshed as I created it in Desktop with  'where rownum<=10000' and then commented this clause in the xml before publishing so that it takes place and expands on the server.

               

              Although it is 0.4% this single field reduction should result in 877x0.4% = 3.5MB change. And I also removed 2 whole fields and applied the same nullification to several other including numeric values. The extracts size is exact the same.

               

              These are the data types I am selecting from a view using custom sql:

               

              Do you know of any documentation on how data is stored in tde?

              • 4. Re: Is NULL contributing to extract size?
                Paul Sherras

                Hi Lukasz,

                 

                Take a read of Understanding Tableau Data Extracts | Tableau Software if you haven't already.

                 

                I've just had a glance at it and from reading this line "several different techniques are used, including dictionary compression (where common column values are replaced with smaller token values)," makes me think that TDEs are indexing each column to compress it (ie replacing United States with 1).

                 

                With this in mind, I think it may be possible that NULL Values behave differently in a TDE extract than we are used to inside SQL databases. I am about to undertake a trial to check this

                 

                Kind regards,

                 

                Paul

                1 of 1 people found this helpful
                • 5. Re: Is NULL contributing to extract size?
                  Paul Sherras

                  Hi Lukasz,

                   

                  I have just done a test and just to confirm - In a tableau TDE extract nulls are indexed like other values as I predicted meaning null values will not save much space for categorised columns. The amount of space saved will be ((varchar data size - integer size) * number of different categories) which will in the case of columns such as continent add up to very little!

                   

                  The only columns I can foresee having a big impact are ones which have a large amount of variability (such as long descriptions. Please see screenshot below to demonstrate.

                   

                  I hope this helps somewhat

                  1 of 1 people found this helpful
                  • 6. Re: Is NULL contributing to extract size?
                    Łukasz Majewski

                    Now it makes perfect sense that the size did not change. Apart from

                    indexing nulls just like other values I also realised that the two columns

                    I removed contained one constant (maybe two) so they would not save much

                    space

                    In the past I did reduce extract size but I replaced many historical values

                    no longer needed. Stupid me:)

                    Thanks again for your effort in educating me.