Unfortunately I do not believe that there is any other way to go about it.
I'm speaking from a hypothetical stand point here but I suspect the following may be some of the reasons that we don't expose more information currently. It would be very cumbersome for Tableau to be able to call out all the data connection information associated with workbooks in the PG DB. Think about the case where a workbook has 30 datasources. This would be a lot of database IO upon publishing this workbook that would love down the process (especially if we were to list out the custom SQL in plain text within the DB). There are also security issues here. Just because a workbook with custom SQL is published to the Server and I (the admin) have access to the PG DB, it does not mean that I should be able to read the custom SQL. Maybe I don't have permissions to the sensitive data source.
I hope the above rambling helps
So there are two tasks needed to do this. (1) Get the .twb and .tds files out of Tableau Server. (2) Parse them for the relevant information and structure the output such that you can query it.
Before we talk custom solutions, some of your needs may be served by using the Data Connections page in the Admin tab. This will allow you to search for specific servers being connected to (note that it's case-sensitive), and for types of servers. This feature works across workbooks AND datasources, which is nice. You can even bulk edit them--a very helpful feature if server names change. However, this will not give you granularity down to databases or tables within those databases, which it sounds like you need.
For (1), here are the solutions I am aware of:
- Run "tabcmd get" commands against a list of workbooks / datasources to export them to .twb files
- Use HTTP GET calls that append ".twb" to workbook urls (e.g. http://[server]/workbooks/[workbookURL].twb)
- Run a script to export the workbooks and datasources directly from the PostgreSQL repository using lo_export (example attached).
For (2), here are the solutions I am aware of:
- Run a "grep" command, or the Powershell equivalent, "Select-String" to look for specific regex matches of interest. This works well for quick-and-dirty searches for specific, uncommon names of columns, databases, and servers. But it's not a very elegant or sustainable solution for repeat needs.
- Parse the XML and dump the structured data into a format that more easily supports querying / analytics (such as a RDMS, or even a .TDE file using the TDE API)
- Use a third-party framework such as the Interworks Workbook SDK
Note that the pgSQL in the script is obviously unsupported, use at your own risk, etc, etc.
export_workbooks.sql.zip 799 bytes
Thanks Matthew, this makes sense. Because of how slow tabcmd and get commands tend to be I'm leaning towards option #3. Your query is a lifesaver, it would have taken me a while to figure out the pg_lo functions.
One thing I wanted clarification on though, I was under the impression that only the xml (twb) was stored in the postgres database with the actual extracts stored on disk somewhere. Is this not the case? I was hoping to avoid pulling down twbx's because of some huge extracts we've got on the server.
Happy to help!
That is the case--.twbx files do not necessarily always contain data extracts, they may be packaged simply because they reference other non-workbook objects such as images, flat files, and the like (for more info, see here). So any workbook exported from Postgres should just have the other types of files. Space needed for them will still be minimal, though obviously you'll still have to unzip them to get at the actual .twbs themselves.
Also, I totally agree about tabcmd being too slow--but one alternative that's much faster and will get you the data extracts (should you need them) is option (1)(2) from above--script the HTTP "get" calls directly, using URLs just as you would in a browser. That way you're not incurring the overhead of tabcmd. For just getting workbooks though, it'll still be way faster to hit postgres directly.