7 Replies Latest reply on Oct 3, 2018 2:30 PM by Gerardo Varela

    Exporting datasource metadata via Postgres query

    Gary Wood

      A few years ago I cobbled together a process to export datasources from the Tableau postgres DB to a file share.  The resulting XML could then be parsed to help us build a data dictionary of our extensive set of Tableau resources.  Most of this was based on a Postgres query (I think by Phillip Chao) using lo_export() to dump the Workgroup.Repository_data.Content. 


      It worked just fine for a long time, but recently I was told that this process is missing almost 50% of our datasources in the export.  I have traced that problem to a join in the Postgres query:


      ... from datasources d join repository_data r  ON on COALESCE(d.repository_data_id, d.repository_extract_data_id) = r.id ...


      It seems that many of our datasources now do not have a Repository_data_id value or a repository_data_extract_id value.  Even older datasources that used to be included in this processing are now being overlooked.


      Is this a schema change under the hood?  I see this same symptom on our 10.2 server and our 10.5 server.  Any suggestions on how to work around this?  Ideally a corrected query ;-)




        • 1. Re: Exporting datasource metadata via Postgres query
          Carisa Chang

          Hi Gary,


          The underlying repository table structure can change with every release in order to meet the needs of new Tableau Server features or changes, I'd hesitate to update this query just to have it break again in the next version.


          I think the best bet will be to move to using the REST API - which returns XML of the data sources on a Tableau Server site by running the Query Data Source method:




          You could write a snippet of code to pull down the list of Sites for your Tableau Server, and then iterate over the Sites and pull down all your data sources - and it wouldn't break with upgrades since the REST API is maintained for external use.

          • 2. Re: Exporting datasource metadata via Postgres query
            Gary Wood

            Thanks Carisa

            I expected as much, I was resistant to give up the old process since I put so much time into it....


            Perhaps you can also assist with a Q on my replacement process.  I am testing the Tableau API in Powershell in order to perform the datasource/workbook extraction to the file system, then I can hand it over to my previous process for parsing the XML.


            This is a partial snippet of my solution.  I have removed the details of initial connection for brevity, but it works.  The output file is generated, but can't be opened by Tableau Desktop, or a zip file utility. So obviously I have something wrong.  Can you verify I am shopping in the right place to get the content XML by going to ...datasources.ChildNodes?  Using the interaction Powershell IDE, it seemed to be the only option.


            $dslist = Invoke-RestMethod -Uri $server/api/$api/sites/$siteID/datasources -Headers $headers -Method Get

            Foreach ($datasource in $dslist.tsResponse.datasources.ChildNodes)


                $datasourceID = $datasource.id

                $datasourcefnm = $datasource.contentUrl.ToString()

                $result = Invoke-RestMethod -Uri $server/api/$api/sites/$siteID/datasources/$datasourceID/content -Headers $headers -Method Get | Out-File  "C:\mytest\$datasourcefnm.tdsx"


            • 3. Re: Exporting datasource metadata via Postgres query
              Carisa Chang

              Ah, I think I see the issues. The Query Data Sources method returns an XML list of the datasources on a Site, rather than returning a collection of the datasources themselves, so it's not possible to iterate through the returned package and output to .tdsx files.


              The return package from Query Data Sources is the XML itself. Does that help?

              • 4. Re: Exporting datasource metadata via Postgres query
                Jeff Strauss

                Hi Gary.  I vaguely recall running into an issue a while ago, maybe after we upgraded to Tableau Server 10.x, and I think I had to do adjust the join to this:

                ON repository_data.tracking_id = COALESCE(data_id,reduced_data_id)


                Attached is the full sql from our prod function, let me know if it helps.  Granted, this isn't officially supported by Tableau, so every upgrade we do, we need to verify that it still works.

                • 5. Re: Exporting datasource metadata via Postgres query
                  Gerardo Varela

                  Jeff's right on the money.  The change happened from version 10.2 to version 10.3.




                  • 6. Re: Exporting datasource metadata via Postgres query
                    Gary Wood

                    Thanks Jeff (and all).  Being a general data miner, I haven't delved deeply into the postgres tables behind Tableau.  Your tip seems to be working in my testing, and is much appreciated.

                    I plan to flesh out the Restful API approach when time allows, and replace the not-supported query approach with that.  Of course, "when time allows" frequently turns into "when the current process breaks".


                    • 7. Re: Exporting datasource metadata via Postgres query
                      Gerardo Varela

                      Since you're still on a older version of Tableau Server there is an undocumented tabadmin command that you might find useful:


                      tabadmin dbdocs


                      It creates a nice html file called data_dictionary.html in the bin folder. I don't remember what version I found it in, 10.4?, but it made transition versions a bit easier.