I'm getting the same thing; I've actually been playing around with Tableau Prep a decent amount since it came out, but this is the first time I've encountered this. The data looks totally fine but then I got this collation error.
For me it seemed to show up when I'm added an aggregation near the end of my flow (I added a bunch of outputs, and every output before the aggregation worked fine and every output after the aggregation failed), though I've done aggregations in other flows without any issues.
Have you been able to isolate where the issue might be stemming from in your flow?
Hi Joe (and Brian),
Ach! The dreaded collation mismatch!
The reason you're seeing this at runtime, but not while you're building your flow, has to do with the fact that we execute things quite differently in each situation. In building (interactive) mode, we're pulling data into Hyper (and using and ICU collation) for each input. When you run the flow (batch mode), we're trying to push as much as possible down into Oracle, and we're clearly making an error with the collations.
Do you know what collation(s) your Oracle system(s) are using? And do you have any sense of what operation is causing the problem? It (a) has to be a string operation, and (b) it has to operate on more than one string at a time. Joining on strings is a common culprit, but it could be a union or a string concatenation all the same.
It might be best to open a support case and include as much info as possible so that we can try to debug it on our end. It's clearly a bug, and the more info we have, the easier it will be for us to sort things out.
Apologies and thanks!
OK, so I was able to isolate the issue, then work to resolve it. I'm sure I won't use the correct words, but I'll try to explain it.
My flow took four inputs - three Oracle tables and one Excel file. The Excel file was just for scaffolding, and I'll eventually put it into its own Oracle table. I'm brand new to Prep, so I cleaned each of the three Oracle tables and then joined them to each other, and the Excel file, one by one. EVERY SINGLE JOIN was on three columns: a date field and two text fields.
HERE WAS THE ISSUE:
One of the text fields contained a null value in all three Oracle tables, and this was one of the fields I used in the joins. When I discovered that the joins weren't working (ie, even if the date field and the other text field joined fine, introducing the text field with some null values wouldn't join correctly), I used the right-click, "EDIT VALUE" functionality and turned all the nulls into 'NA'. This fixed the joins issue, since the NAs matched, and my flow looked great in realtime. But ultimately, I traced this COLLATE error back to this.
Once I realized that this "EDIT VALUE" was the culprit, I wrote a calculation instead. This gave me the same output as the "EDIT VALUE" (essentially I made my ELSE statement equal to NA), fixed the joins, and allowed the flow to complete successfully.
Ultimately I love the EDIT VALUE functionality, and used it gratuitously throughout this flow. It only seemed to break things in a couple places though. Isaac, I do not know what collations our Oracle systems are using, and didn't know what collation was until yesterday. But you were correct in that it involved strings, and joining on said strings. For whatever reason, the Oracle flow in batch mode does not like it when you "Edit Values" and turn a null into a text string.
I did open a support ticket, but may point them here so there's one conversation ongoing.
Thanks for the narrative, and thanks for filing the support ticket. I tired this -- joining Oracle to Excel w/ a remapped null -- and I got an error, too. I just got a different error. :-P
The good news is that it appears to be fixed on our latest dev builds -- there were some issues around this that were found and fixed. So it's likely that your issue has been addressed internally already. If you have a more detailed repro (flow + data) that you can share through support, the team should be able to take a look. If not, you can take a look at the next release and see if that addresses the issue and we can take it from there.
Just to add data to this issue - I run into this same error when combining data from SQL server and from Redshift. I tried modifying the nulls from the join as mentioned above by still get the error. I am trying to join on 2 "keys" that were originally numeric, however when I do it that way, the flow "runs" but takes WAY longer than something that only contains 16k rows should. I'll see if I can make it work, otherwise, I'll look out for the next version for a fix.