I have had plenty of issues with the Jet database engine - dates not being imported correctly most commonly. But I know and am aware of this issue and will commonly create a schema.ini. However, for less advanced users, Tableau's use of Jet is causing severe data quality issues. For example, I was working with a colleague who had pulled some data in from Excel. We were using one of the dimensions (location) when I noticed a much larger than usual NULL category (the data had some legitimate NULLS in it). It turns out the first 50,000 rows of the data had a numerical tag for the locations, and the last 20,000 were alphanumeric. Jet, having decided location was only numeric then replaced these latter locations with NULLs. I guarantee you (Tableau) that this is happening more often than you think and people are making decisions based on bad data. I cannot say this strongly enough; this is Tableau's worst failing by far. Suggestions for partially fixing this (in the absence of replacing Jet):
- If someone changes a data type (number to string), prompt the user that the Jet engine may have missed some data out. Automatically write a schema.ini out based on what Tableau now knows about the data, and offer to reimport
- Upon import, list the columns out and inform the user what Jet thinks they are, show the number of NULLs returned in each column. Allow the user to make changes, automatically create a schema....
- Dramatically increase the number of rows checked - I would also hope that the settings are set to Import Mixed Types to Text (rather than majority: http://office.microsoft.com/en-us/access-help/initializing-the-microsoft-excel-driver-HP001032159.aspx)
- After import, Tableau would look at any columns with a suspiciously high number of NULLs and do a fake import of just those columns this time checking many more rows (if not all), write a schema.ini out if there's a problem, and reimport.