1 2 Previous Next 20 Replies Latest reply on Jun 5, 2020 8:44 AM by Andrzej Gabryel

    Join data in Tableau Hyper API

    Andrzej Gabryel

      Dear Community,

       

      I am wondering if it is possible to join data to existing hyper extract. I am using python in Data Bricks.

      I have one main extract and additional data which I want to use to join (in excel format).

       

      As I understand in code I define link to one extract

       

       

           with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:

       

              # Connect to existing Hyper file "superstore_sample_update.hyper".

              with Connection(endpoint=hyper.endpoint,

                              database=path_to_database) as connection:     <-- path_to_database is where I have my extract

       

       

                  rows_pre_update = connection.execute_list_query(       <-- here I am telling that I want to use this extract

                      query=f"SELECT {escape_name('Loyalty Reward Points')}, {escape_name('Segment')}"

                      f"FROM {escape_name('Customer')}")

                  print(f"Pre-Update: Individual rows showing 'Loyalty Reward Points' and 'Segment' "

                        f"columns: {rows_pre_update}\n")

       

       

      How can I use more than one extract in one query( or excel.csv file)?

       

      I would like my query to look like this

       

      select

      s.*,

      c.Category,

      c.Sub_Category

      from Sales s

      left join categories c on s.ID=c.ID

       

      I cannot just do join in Tableau I have to use Hyper Api (real case is much more complicated than this example).

      I used Alteryx to create this Extract, Afterwards, I will be appending this data with category and subcategory included, but I have to fill historical data.

       

      Greetings

      Andrzej Gabryel

        • 1. Re: Join data in Tableau Hyper API
          Adrian Vogelsgesang

          Hi Andrzej,

           

          Sorry, I totally missed your question. I am only actively monitoring the "HyperAPI"-subtopic here on this forum. In general, your Hyper-related questions will probably get more traction over there...

           

          To your question:

           

          Joining multiple Hyper files

           

          You can load multiple Hyper files from the same HyperAPI-connection by using connection.catalog.attach_database. After attaching all your Hyper files, you can access them from your SQL statements using the "alias" you specify as part of the call to attach_database.

           

          A script would look for example like this:

          with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:

               with Connection(endpoint=hyper.endpoint) as connection:

                   connection.catalog.attach_database(path_to_database1, "db1")

                   connection.catalog.attach_database(path_to_database2, "db2")

                   join_result = connection.execute_list_query(

                          query=f"SELECT {escape_name('Loyalty Reward Points')}, {escape_name('Segment')}"

                          f"FROM {TableName('db1', 'Extract', 'Customer')} JOIN {TableName('db2', 'Extract', 'Sales')}"

                          f"ON {ColumnName('Customer', 'customer_id')} = {ColumnName('Sales', 'customer_id')}")

           

          The key takeaways here are:

          • If you access more than one file, I would recommend to start with an "empty" connection. Note how the `Connection` constructor no longer is called with a database name. Initially, no database is loaded into your session.
          • Instead, all involved databases are attached with `attach_database`. The given aliases ("db1" and "db2" in this case) can be chosen however you like
          • Using the aliases, one can later refer to tables from both databases within SQL queries. Use `TableName` for that. TableNames are automatically formatted correctly for SQL queries and can hence be directly used inside f-strings.
          • You can use ColumnName and provide it both the table-name and the column-name in order to distinguish between columns from different tables but with the same column name


          Joining adhoc-data with an existing Hyper file

           

          In your case, however, the additional data comes from an Excel file and not from a second Hyper file. In this case, I would recommend a slightly different approach:

          • Create a temporary table. Note that a TableDefinition accepts a `persistence` parameter which you can set to Persistence.Temporary. Given such a temporary table definition, `catalog.create_table` will create a temporary table instead of a persisted one.
          • Insert your Excel data using HyperAPI's Inserter in combination with your favorite Excel reading library. (HyperAPI cannot directly read Excel files)
          • Execute your join query, joining your data from the Hyper file with your temporary table


          I hope this answers your question - please let me know if it doesn't or if you have followup questions

           

           

          Cheers,
          Adrian

          1 of 1 people found this helpful
          • 2. Re: Join data in Tableau Hyper API
            Andrzej Gabryel

            Hi Adrian

             

            I have also missed your answer. Yes, it answers my question. I will try to use it and If I have any questions I will come back to you

             

            Cheers,

            Andrzej

            • 3. Re: Join data in Tableau Hyper API
              Andrzej Gabryel

              Hi Adrian

               

              I have an issue with this solution. When I create extract sing Alteryx its name looks like this. Table "Extract" has qualified name: "Extract"."Extract"

              TO delete or update I managed it like this

              But now when I have to tables with this same name I do not know what I should set here

               

              Would you be able to help me with this?

               

              Cheers

              Andrzej

              • 4. Re: Join data in Tableau Hyper API
                Adrian Vogelsgesang

                Hi Andrzej,

                 

                I don't understand your problem/question. The snippets you posted above look good to me, although incomplete.


                Which error message are you getting? What exact issue is you are facing?

                1 of 1 people found this helpful
                • 5. Re: Join data in Tableau Hyper API
                  Andrzej Gabryel

                  Hi Adrian

                   

                  Table Name of my both my tables look like this

                  "Extract"."Extract" <- this is one name

                  I don't know how I should define Table names in case both are the same.

                   

                  Please find attached example hyper files which I am trying to use

                   

                  # update
                  import shutil
                  
                  
                  from pathlib import Path
                  
                  
                  from tableauhyperapi import HyperProcess, Telemetry, \
                      Connection, CreateMode, \
                      NOT_NULLABLE, NULLABLE, SqlType, TableName, TableDefinition, \
                      Inserter, \
                      escape_name, escape_string_literal, \
                      HyperException
                  
                  
                  
                  
                  def run_join_data_in_existing_hyper_file():
                  
                  
                      table_name = TableName("Extract", "Extract")
                      # Path to a Hyper file containing all data inserted into Customer, Product, Orders and LineItems table.
                      # See "insert_data_into_multiple_tables.py" for an example that works with the complete schema.
                      path_to_source_database = 'C:/Python/TEST2.hyper'  
                      path_to_source_database1 = 'C:Python/Test3.hyper'  
                      # Make a copy of the superstore example Hyper file.
                      path_to_database = Path(shutil.copy(path_to_source_database, "join.hyper")).resolve()
                      path_to_database1 = Path(shutil.copy(path_to_source_database1, "join2.hyper")).resolve()
                      # Starts the Hyper Process with telemetry enabled to send data to Tableau.
                      # To opt out, simply set telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU.
                      with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
                           with Connection(endpoint=hyper.endpoint) as connection:
                                   connection.catalog.attach_database(path_to_database, "db1")
                                   connection.catalog.attach_database(path_to_database1, "db2")
                                   join_result = connection.execute_list_query(
                                      query=f"SELECT {escape_name('Building name')}, {escape_name('Secrete name')}"
                                      f"FROM {TableName('db1', 'Extract', table_name)} JOIN {TableName('db2', 'Extract',table_name)}"
                                      f"ON {ColumnName('Building Code')} = {ColumnName('Building Code')}")
                  
                  
                            
                  
                  
                  if __name__ == '__main__':
                      try:
                          run_delete_data_in_existing_hyper_file()
                      except HyperException as ex:
                          print(ex)
                          exit(1)
                  
                  • 6. Re: Join data in Tableau Hyper API
                    Adrian Vogelsgesang

                    Now I got it...


                    You will need to introduce aliases for your table:

                    f"SELECT {escape_name('Building name')}, {escape_name('Secrete name')}" 

                    f"FROM {TableName('db1', 'Extract', table_name)} t1 JOIN {TableName('db2', 'Extract',table_name)}" t2

                    f"ON {ColumnName('t1', 'Building Code')} = {ColumnName('t2','Building Code')}") 

                    (Note the additional "t1" and "t2" names in there...)

                    1 of 1 people found this helpful
                    • 7. Re: Join data in Tableau Hyper API
                      Andrzej Gabryel

                      Hi Adrian

                       

                      I have made these changes and I still get error. Sorry I am not familiar with Python

                      ValueError: Schema name is specified twice

                       

                      # update
                      import shutil
                      
                      
                      from pathlib import Path
                      
                      
                      from tableauhyperapi import HyperProcess, Telemetry, \
                          Connection, CreateMode, \
                          NOT_NULLABLE, NULLABLE, SqlType, TableName, TableDefinition, \
                          Inserter, \
                          escape_name, escape_string_literal, \
                          HyperException
                      
                      
                      
                      
                      def run_join_data_in_existing_hyper_file():
                          """
                          An example of how to Update data in an existing Hyper file.
                          """
                          print("EXAMPLE - Update data from an existing Hyper file")
                          table_name = TableName("Extract", "Extract")
                          # Path to a Hyper file containing all data inserted into Customer, Product, Orders and LineItems table.
                          # See "insert_data_into_multiple_tables.py" for an example that works with the complete schema.
                          path_to_source_database = 'C:/Python/TEST2.hyper'  
                          path_to_source_database1 = 'C:/Python/Test3.hyper'  
                          # Make a copy of the superstore example Hyper file.
                          path_to_database = Path(shutil.copy(path_to_source_database, "join.hyper")).resolve()
                          path_to_database1 = Path(shutil.copy(path_to_source_database1, "join2.hyper")).resolve()
                          # Starts the Hyper Process with telemetry enabled to send data to Tableau.
                          # To opt out, simply set telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU.
                          with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:
                               with Connection(endpoint=hyper.endpoint) as connection:
                                       connection.catalog.attach_database(path_to_database, "db1")
                                       connection.catalog.attach_database(path_to_database1, "db2")
                                       join_result = connection.execute_list_query(
                                          query=f"SELECT {escape_name('Building name')}, {escape_name('Secrete name')}"
                                          f"FROM {TableName('db1', 'Extract', table_name)} t1 JOIN {TableName('db2', 'Extract',table_name)} t2"
                                          f"ON {ColumnName('Building Code')} = {ColumnName('Building Code')}")
                      
                      
                                
                      
                      
                      if __name__ == '__main__':
                          try:
                              run_join_data_in_existing_hyper_file()
                          except HyperException as ex:
                              print(ex)
                              exit(1)
                      
                      
                      • 8. Re: Join data in Tableau Hyper API
                        Adrian Vogelsgesang

                        Hi Andrzej,

                         

                        The issue here is that the "schema name" is indeed specified twice.

                         

                        table_name = TableName("Extract", "Extract")

                        already specified the "schema name" to be "Extract" (the first parameter specifies the schema name).


                        Then, further down

                        TableName('db1', 'Extract', table_name)

                        specifies the "schema name" again (in this case the 2nd parameter) and tries to combine it with the already created table_name.

                        The original table_name variable already contains a schema name, and hence it rejects to specify the schema name again.

                         

                        The solution should be simple:

                        TableName('db1', 'Extract', table_name)

                         

                        Also, note that the other "ON"-clause with the join condition doesn't contain the ColumnNames as suggested in my previous post, yet.


                        Let me know if that works!

                         

                        Cheers,

                        Adrian

                        1 of 1 people found this helpful
                        • 9. Re: Join data in Tableau Hyper API
                          Andrzej Gabryel

                          THe issue here is that table name Is

                          'Extract'.'Extract'  <- there is dot

                          And I cannot change this name (maybe it is somehow possible, but I don't know how)

                          So when I try to define this name, I still get error.

                          SyntaxError: invalid syntax 

                           

                           f"FROM {TableName('db1','Extract', 'Extract'.'Extract')} t1 JOIN {TableName('db2','Extract', 'Extract'.'Extract')} t2"
                          
                          • 10. Re: Join data in Tableau Hyper API
                            Adrian Vogelsgesang

                            THe issue here is that table name Is

                            'Extract'.'Extract'  <- there is dot

                            Where are you getting this from?
                            Looking at the files "Test2.hyper" and "Test3.hyper", I can only see a table named "Extract" inside a schema named "Extract", but no table named "Extract.Extract".

                             

                            Where are you getting the name 'Extract'.'Extract' from? Most of the tools which read Hyper files (e.g. Tableau Desktop), mean table named "Extract" inside a schema named "Extract", when they display you 'Extract'.'Extract'

                            1 of 1 people found this helpful
                            • 11. Re: Join data in Tableau Hyper API
                              Andrzej Gabryel

                              Hi,

                               

                              When I read Table I get this

                               

                              When I use just 'extract'. I get this error

                                                  query=f"SELECT {escape_name('Building name')}, {escape_name('Secrete name')}"
                                                  f"FROM {TableName('db1','Extract','Extract')} t1 JOIN {TableName('db2','Extract', 'Extract')} t2"
                                                  f"ON {ColumnName('t1', 'Building Code')} = {ColumnName('t2','Building Code')}") 
                              

                               

                              Maybe it is because the second extract is red. But I have no idea why

                              • 12. Re: Join data in Tableau Hyper API
                                Adrian Vogelsgesang

                                When I use just 'extract'. I get this error

                                Which error? Can you share the error message instead of your source code?

                                 

                                I just tried running your code and

                                TableName('db1', 'Extract', 'Extract')

                                works for me. I was able to merge the two data sets.

                                 

                                Two other adjustments were necessary for me, though:

                                • There was a whitespace missing between the "t2" and the "ON"
                                • We don't actually have a ColumnName helper class (my fault - should have tested my example before posting it). Instead, I rewrote the ON clause to work without this helper class


                                The following snippet works for me:

                                import shutil

                                from pathlib import Path

                                 

                                from tableauhyperapi import HyperProcess, Telemetry, \

                                    Connection, CreateMode, \

                                    NOT_NULLABLE, NULLABLE, SqlType, TableName, TableDefinition, \

                                    Inserter, \

                                    escape_name, escape_string_literal, \

                                    HyperException

                                 

                                def run_join_data_in_existing_hyper_file():

                                    """

                                    An example of how to Update data in an existing Hyper file.

                                    """

                                    print("EXAMPLE - Update data from an existing Hyper file")

                                    table_name = TableName("Extract", "Extract")

                                    # Path to a Hyper file containing all data inserted into Customer, Product, Orders and LineItems table.

                                    # See "insert_data_into_multiple_tables.py" for an example that works with the complete schema.

                                    path_to_source_database = '/home/local/TSI/avogelsgesang/Test2.hyper'

                                    path_to_source_database1 = '/home/local/TSI/avogelsgesang/Test3.hyper'

                                    # Make a copy of the superstore example Hyper file.

                                    path_to_database = Path(shutil.copy(path_to_source_database, "join.hyper")).resolve()

                                    path_to_database1 = Path(shutil.copy(path_to_source_database1, "join2.hyper")).resolve()

                                    # Starts the Hyper Process with telemetry enabled to send data to Tableau.

                                    # To opt out, simply set telemetry=Telemetry.DO_NOT_SEND_USAGE_DATA_TO_TABLEAU.

                                    with HyperProcess(telemetry=Telemetry.SEND_USAGE_DATA_TO_TABLEAU) as hyper:

                                         with Connection(endpoint=hyper.endpoint) as connection:

                                                 connection.catalog.attach_database(path_to_database, "db1")

                                                 connection.catalog.attach_database(path_to_database1, "db2")

                                                 join_result = connection.execute_list_query(

                                                    query=f"SELECT {escape_name('Building name')}, {escape_name('Secrete name')} "

                                                    f"FROM {TableName('db1', 'Extract', 'Extract')} t1 JOIN {TableName('db2', 'Extract', 'Extract')} t2 "

                                                    f'ON t1."Building Code" = t2."Building Code"')

                                                 print(join_result)

                                 

                                if __name__ == '__main__':

                                    try:

                                        run_join_data_in_existing_hyper_file()

                                    except HyperException as ex:

                                        print(ex)

                                        exit(1)

                                1 of 1 people found this helpful
                                • 13. Re: Join data in Tableau Hyper API
                                  Andrzej Gabryel

                                  Hi Adrian,

                                   

                                  It worked, I was using Column name function for join

                                   

                                  I am also wondering how I can create new table as a result of a join , I tried this

                                   

                                                      query=f"Create Table {TableName('new table')}  AS SELECT {escape_name('Building name')}, {escape_name('Secrete name')} "
                                                      f"FROM {TableName('db1', 'Extract', 'Extract')} t1 JOIN {TableName('db2', 'Extract', 'Extract')} t2 "
                                                      f'ON t1."Building Code" = t2."Building Code"')
                                  
                                  

                                   

                                   

                                  But I get this error

                                   

                                  • 14. Re: Join data in Tableau Hyper API
                                    Adrian Vogelsgesang

                                    The general approach looks good, however the "CREATE TABLE" won't know in which database to create the table.

                                    You attached two Hyper files, so Hyper doesn't know inside which one to create the new table.

                                    (Admittedly, the error message does a real bad job at telling you this...)

                                     

                                    To get this to work, the TableName has to explicitly specify both the database name and the schema name:

                                    f"CREATE TABLE {TableName('db1', 'Extract', 'new table')}  AS SELECT {escape_name('Building name')}, {escape_name('Secrete name')} "

                                    1 of 1 people found this helpful
                                    1 2 Previous Next