0 Replies Latest reply on Aug 6, 2019 1:49 AM by Karel Kolman

    Extract API Unicode support

    Karel Kolman

      Hi, we're dealing with issues similar as in

      char string converted to unicode string in extract API java

      ExtractAPI: error converting utf32 to utf16: 10

      when exporting certain unicode characters via the Extract API.


      The environment is Linux and we're testing Extract API from Java and from C/C++.


      Java Extract API handles Row#setString unicode characters up to U+D7FF Unicode-Zeichen .

      Anything further in the UNICODE plane crashes the Extract API.

      And the crashes happen in ToTableauString method


      This happens with

      1. Java API
      2. C API


      CPP samples seem to handle all UNICODE characters as far as i can tell from the tests i ran, so im wondering what is wrong with the Java API.


      Tested extract api version is hyperextractapi-cpp-linux-gcc-x86_64-release_2019_2.2019.2.2.189.r7796868f


      When looking WHY the c++ solution is actually working i see in TableauHyperExtract_cpp.h  the usage of MakeTableauString(value.c_str()) where the std_wstring#c_str is (strangely) used yet this results in unicode getting to the extract correctly (judging after seeing the data in TableauServer).


          int columnNumber,
          std::wstring value
          TAB_RESULT result = TabRowSetString(m_handle
              , columnNumber
              , MakeTableauString(value.c_str()).c_str()
          if ( result != TAB_RESULT_Success )
              throw TableauException( result, TabGetLastErrorMessage() );



      So could a developer answer these questions:

      - is whole UNICODE set supported in extract API ?

      - could you fix the Java API to not crash JVM in com.tableausoftware.common.StringUtils#ToTableauString as the C++ samples seems to be able to what we're trying to do via the JAVA api ?


      Thank you