11 Replies Latest reply on Jan 10, 2017 7:32 AM by Matt Coles

    List extracts with unused columns?

    Jason Scarlett

      Is it possible to get a list of extracts (stand alone or part of a workbook) that have columns that are are not being used ... and while we are on the topic, if it is not asking too much, list the unused columns or estimate the size of them?

      All of this is an attempt to save disk space and target performance improvement.

       

      Jason

        • 1. Re: List extracts with unused columns?
          Matt Coles

          Not that I'm aware of, at this point. The best I've been able to do is export the TWB files from Server; parse through them to detect the hidden/unhidden status of all fields in their data sources; then blend the results with some other data I have such that I get a list of the most recently published, 10MB+ workbooks with a very small number of fields hidden. This generally means that the author is using about 80% more space than they need to, because they didn't use the "Hide Unused Fields" feature. It's still a bit of an art rather than a science, and you can't do it with published Data Sources so much, because the workbooks connecting to them might need those fields, and the author wouldn't even know.

          • 2. Re: List extracts with unused columns?
            Jason Scarlett

            Thanks Matt,

            I'm more than willing to roll up my sleeves and dig around. For now, let's just focus on getting a list of workbooks with unused fields.

            What is the most efficient way to export TWBs from a server (I'm an admin so navigating the server folders is also an option)? I have used tabcmd before but that always exports a TWBX and requires it to be unzipped. Is there a way to get the TWB file straight away?

            Jason

            • 3. Re: List extracts with unused columns?
              Glen Robinson

              Hi Jason

              You might want to take a look at the Data Source Auditing within Powertools for Tableau

              http://powertoolsfortableau.com/tools/desktop/

              This should give you what you need

              If you need to export just TWB files (without the TWBX) take a look at the following

              How to extract individual workbooks from a tsbak

              All the best

              Glen

              1 of 1 people found this helpful
              • 4. Re: List extracts with unused columns?
                Matt Coles

                We use something close to the below, run under the tblwgadmin account (not supported, use at your own risk, may not work at all, etc etc):

                 

                 

                  select 
                    lo_export(oid, 'c://outputdir//' || repo_temp.id || '_' || repo_temp.repository_url ||  lo_temp.filetype)
                  from 
                    (
                    select 
                      w.id,
                      w.repository_url, 
                      rd.content as "oid" 
                    from workbooks w inner join repository_data rd 
                      on COALESCE(w.data_id, w.reduced_data_id) = rd.tracking_id
                    ) as "repo_temp"
                  left join
                    (
                    select
                      loid,
                      case when substring(data from 1 for 4) = 'PK\003\004' then '.twbx' else '.twb' end as "filetype"
                    from pg_largeobject 
                    where pg_largeobject.pageno = 0
                    ) as "lo_temp"
                  on repo_temp.oid = lo_temp.loid;
                
                

                 

                Note that this method will be going away in the later versions of Tableau Server, as workbooks will no longer be stored in the PGSQL repository database. You'll need to rely on the REST API (note that it will include a feature to allow extraction of just the TWB xml, without the fat extract data!).

                2 of 2 people found this helpful
                • 5. Re: List extracts with unused columns?
                  Jason Scarlett

                  Thanks Glen,

                  I tried Powertools but it failed due to the large size of our content (8000+ dashboards). I may reach out to them to provide some feedback if this doesn't pan out.

                  I never knew about the largeobject table, will look into that with the code MATT provided below.

                  Jason

                  • 6. Re: List extracts with unused columns?
                    Jason Scarlett

                    Matt,

                    I looked at our pg_largeobject table from the tblwgadmin account but it is empty. We are running Server v10.1.0.

                    I'll read up on this table and figure make sure I am accessing it correctly, then post back any troubles/questions.

                    Jason

                     

                    EDIT: I found it, i was looking in the wrong schema..

                    • 7. Re: List extracts with unused columns?
                      Jason Scarlett

                      Thanks Matt/Glen,

                      I can now pull the TWB/TWBXs out of pg_largeobject.

                       

                      Can you give me a hint on what to look for in the twb file that would indicate an unused column? Is this it:

                        <column datatype='real' hidden='true' name='[F3]' role='measure' type='quantitative' />

                       

                      In the case when this is a "live" connection, it doesn't mean much, but in the case of a TWBX would this indicate an unused column?

                      Is there something in the "datasource" XML description I need to look out for to determine if it is a "live" connection vs an "extract"?

                      • 8. Re: List extracts with unused columns?
                        Matt Coles

                        After you pull those TWBs, you'll need to identify the ones that are Zips (TWBX files) and unzip them. Once you've got your directory full of clean TWBs, here's the Powershell script I currently use to pull their info (not supported, use at your own risk, may not work at all, etc etc):

                         

                        $SourceDir = 'C:\twbs' # dir holding all extracted .twbs
                        $OutputTempFile = 'C:\temp\HiddenFieldAnalysis.txt' #temporary local location for analysis results
                        $DestFile = '\\someshare\HiddenFieldAnalysis.txt'  # destination for analysis results
                        
                        If ((Test-Path $OutputTempFile)){
                            Remove-Item $OutputTempFile
                        }
                        
                        dir $SourceDir\*.twb |
                            ForEach-Object {
                                $workbook = [Xml](Get-Content $_)
                                foreach ($datasource in $workbook.workbook.datasources.datasource) {
                                    If ($datasource.extract.enabled -eq 'true'){
                                        foreach ($column in $datasource.column) {
                                            # get datasource names
                                            If ($datasource.caption){
                                                $DatasourceName = $datasource.caption
                                            } Else {
                                                $DatasourceName = $datasource.name
                                            }
                                            
                                            # get column names
                                            If ($column.caption){
                                                $ColumnName = $column.caption
                                            } Else {
                                                $ColumnName = $column.name
                                            }
                                        
                                            # check hidden status of the column
                                            If ($column.hidden -eq 'true'){
                                                $IsHidden = $true
                                            } Else {
                                                $IsHidden = $false
                                            }
                                            $_.name + '|' + $DatasourceName + '|' + $ColumnName + '|' + $IsHidden + '|' + $_.LastWriteTime.ToString('yyyy-MM-dd HH:mm:ss') >> $OutputTempFile
                        
                                        }
                                    }
                                }
                            }
                        
                        Copy-Item $OutputTempFile $DestFile
                        
                        
                        
                        
                        
                        

                         

                        I then point Tableau at the file, blend/join it to the Workbooks info from the PostgreSQL repository database, and then I can start identifying which workbooks are likely wasting lots of space.

                        1 of 1 people found this helpful
                        • 9. Re: List extracts with unused columns?
                          Toby Erkson

                          What about using the Document API to gather column info?  No idea, just throwing that out there.

                          • 10. Re: List extracts with unused columns?
                            Jason Scarlett

                            Wow! You just saved me hours (or days) or work!

                            I'm extracting the TWBs now ... estimate is 4.5 hours .. will try once done.

                            • 11. Re: List extracts with unused columns?
                              Matt Coles

                              Good idea, Toby! Except that the Document API doesn't look like it will tell you whether the fields are hidden or not, from what I can tell. If anyone thinks that would be helpful, I'd recommend adding it as an Issue.