Leveraging APIs to reach your data

Version 1

    At Tableau we have an obvious need for reporting on data quickly and easily. To provide such a service we typically load our data into tailored internal databases to query the data faster. When the data isn’t pre-loaded into our databases, users can use the wonderfully built data connectors to connect to and query the data they need.

                    However, there are times when this system does not work perfectly. When building the databases it is difficult to know which data to load and which to leave out. If the database wasn’t built right the first time, it is non-trivial to make changes to this production system. It is inherently difficult to know which data you need and what questions you will ask, before you have access to the data. This is typically a perfect time for use of our data connectors, however in particular circumstances it can be very time consuming for the connectors to join all of your data together, when all you need are a few dozen fields.

                    What we have done to supplement the existing system is to utilize third-party APIs and our Tableau APIs to rapidly prototype the data model and provide the up-to-date data to the business. For a particular business case, we had users trying to report on data in Salesforce that was taking upwards of hours to connect to using the default data connectors. As our solution, we built a Java program that connected to the data using the Salesforce SOAP API and exported the result in a tde using our Data Extract API. The Salesforce API allowed us to query the particular data from an object and many of its relationships. From the result set we can model the data how we wish and export it with our Data Extract API. For the rare case when certain connections were prohibitive, we could either issue separate queries and join the data programmatically, or join the extract with supplemental extracts or preexisting data sources in Tableau Server or our databases. This method enabled us to work closely and iteratively with the business to meet their needs and answer their questions. Updating the code from business feedback took around an hour of turnaround time. Once the extract was approved we took the Java program as a JAR and created a schedule to run it and publish the resulting extract to Tableau Server where all of our users could access the data.

                    With our program, we were able to reduce the refresh time from hours, down to minutes for the long queries and seconds for the average case. The resulting product also provides the model for how we need to build our internal database for this data. This solution enables rapid prototyping of data models and works as a temporary or more permanent solution for how users access their data.



    Step by step guide to writing a Salesforce to Tableau Server extractor in Java


    1. You will need a user with API permissions to your Salesforce environment, with permissions to all of the objects and fields in Salesforce that you will need.
    2. You will need to get the Salesforce WSDLs from your Salesforce environment, including the Enterprise WSDL.
    3. Transform these WSDLs into JARs
    4. Get the Data Extract API JAR from Tableau
    5. Create a Java program and include the libraries from Tableau and Salesforce
    6. In your program create a Salesforce SOQL API query to retrieve the object you need along with any other objects that has a relationship to it. For particularly lengthy queries, you can set a WHERE clause into the query to speed it up. Some restrictions exist in SOQL language, you can work around them programmatically or with data blending in Tableau. For further reference please use, https://resources.docs.salesforce.com/sfdc/pdf/salesforce_soql_sosl.pdf.
    7. Set a file location on your server for where you want your TDE dumped.
    8. Create a SOAP connection to your Salesforce environment
    9. Create a Tableau table through the extract API with a defined table definition that matches the data you want to be extracting
    10. Execute query through your Salesforce connection
    11. Loop through the records in the query result
    12. Create a new row from the table definition
    13. For every called field, add that field’s value to the row. Load all of the one-to-one object relationship fields first.
    14. Loop through all of the fields in any one-to-many object relationships you called
    15. For every field, check if the query returned anything in the results for that field, if yes then set that result’s value into the row, if no then set that row entry to null to avoid duplicating prior row data in case of null
    16. Insert row into table at end of one-to-many loop
    17. If row has not been inserted prior, then insert row into table at end of the singular result’s record
    18. Check if query result is done, if not done get queryMore from the connection using the query results’ query locator, then loop back through to step 11
    19. Save table with a temp file name
    20. If table was built correctly, delete prior file from previous run
    21. Rename temp save file to the correct file name in the preset file location on the server
    22. Package program as a JAR and copy to server
    23. Create a cmd file to run the JAR
    24. Create a Windows Scheduler task to run the cmd file on a set schedule
    25. Create a PowerShell script that watches the target folder, if anything appears there it uses the first part of the name to determine the project and the rest to determine the data source name then does a tabcmd publish.