    ExtractAPI: error converting utf32 to utf16: 10

    Marko Bauhardt


      We are using the JAVA tableau SDK version 10200.17.0328.0755 and their API to append rows to a TDE file.

      Our JVM terminates with error `converting utf32 to utf16: 10`, if our String value we want to write is encoded via UTF-32.

      UTF-16 strings works without any issue.


      I took a look into the `DataExtract.log` file to figure out what happens behind the scenes. I found the following lines which creates the tableau table



      2018-10-09 15:11:29.225 (0x70000e337000): (create table [Extract].[Extract])

      2018-10-09 15:11:29.225 (0x70000e337000): Compiling query with Memory Budget=17179869184 MemoryAvailable=17179869184

      2018-10-09 15:11:29.261 (0x70000e337000): Session1: QueryExecute: OK, Elapsed time:0.036s, Compilation time:0.000s, Execution time:0.036s

      2018-10-09 15:11:29.261 (0x70000e73a000): Session1: QueryExecute:

      2018-10-09 15:11:29.261 (0x70000e73a000): (create column [Extract].[Extract].[utf32_col] ( ( "collation" "en_US" ) ( "compression" "heap" ) ( "factory" "varchar" ) ( "ordinal" "0" ) ( "scale" "2" ) ( "storagewidth" "8" ) ))

      2018-10-09 15:11:29.268 (0x70000e73a000): Compiling query with Memory Budget=17179869184 MemoryAvailable=17179869184

      2018-10-09 15:11:29.307 (0x70000e73a000): Session1: QueryExecute: OK, Elapsed time:0.046s, Compilation time:0.006s, Execution time:0.040s

      2018-10-09 15:12:05.954 (0x70000fb49000): tdeserver: disconnected connection=::1:51126->::1:51128: IPC_SocketConnection::Read(len=16, connection=::1:51126->::1:51128): The connection was closed by the peer in IPC_Socket::Recv(len=16)

      2018-10-09 15:12:05.954 (0x70000fb49000): tdeserver: closing connection=::1:51126->::1:51128

      2018-10-09 15:12:05.954 (0x70000fb49000): tdeserver: closing orphaned session1

      2018-10-09 15:12:05.964 (0x7fffaa99f380): tdeserver: exit (0)


      So it looks like that the table is created with collation `en_US` but there is no config value regarding the encoding. I didn't find a method in the API to define the encoding of a tableau table.

      Tableau Extract API: Tableau::TableDefinition Class Reference


      So my question is

      Does the extract API from the tableau SDK version 10.2.x  support UTF-32 character encoded values? Or is only UTF-8 supported.

          Noah Beasley

          Hi Marko,


          Are you using Mac OSX to run the Java application? There is a difference in handling of UTF-8 4-byte unicode characters between Mac and Windows which can cause exactly this error in Java with the Tableau SDK.

          If so,  using the CHAR_STRING type and setCharString method for insertion of the affected strings, instead of UNICODE_STRING and setString (respectively), should resolve the error.


          If the above does not apply, it would be good to submit a case to Tableau Technical Support with the details, as this could be something new.

            Marko Bauhardt

            Hi Noah,

            yes we are using Mac OSX, but we had the same error on Linux. We used the setCharString method like you suggested.



            row.setCharString(index, myString);



            When we are using this method we get



            Caused by: com.tableausoftware.TableauException: type mismatch

                at com.tableausoftware.extract.Row.setCharString(Unknown Source)



            So, I will contact technical support as you suggested.