Monday, July 19, 2010

What I learned today – importing mixed data types from Excel

Recently a coworker came up with an issue I never ran into before. He’s importing a large number of Excel spreadsheets into a SQL 2005 instance. One of the columns contains mixed data; 1, 2, 3, x, y, z for example. What he found is that if the first 20 or so rows are strictly numeric then the connection treats the whole column as numeric, and alphanumeric values are returned as NULLs. He’s querying using OPENROWSET, but it looks like this happens in SSIS as well. 

What his research found was that there’s a few registry keys that come into play. There’s also another property he needed to set in his connection string. When he adds “IMEX=1” it allows importing of mixed data types based on the the registry setting “ImportMixedTypes”. The default value is “text”. One drawback to text is that you’re limited to 255 characters. The only other available value is “Majority Type”. When you use Majority Type some of the values are scanned and the data type with the most values is used.

The number of rows scanned is based on the “TypeGuessRow” registry setting. By default it’s set to 8, but valid values are 0 to 16. If you set it to 0 then all rows are sampled. Setting it to a value between 1 and 16 samples that number.

Both keys can be found at HKEY_Local_Machine/Software/Microsoft/Jet/4.0/Engines/Excel/.  

Here are a few articles that explain the issue better.

http://jingyang.spaces.live.com/blog/cns!CC21A118B1B5250!318.entry?sa=781031802

http://www.instantpages.ltd.uk/ADODB_WP.htm

UPDATED 7/20/2010: You can read more about the actual issue here.

3 comments:

Piotr Rodak said...

That's funny John, I was asked for help with the exactly same issue today. Thanks for this post :)

jaype said...

You're welcome Piotr, hope it helped. I hadn't heard of it before I wrote about it either.

Todd McDermid said...

This behaviour is the bane of the Excel provider's existence - and therefore a PITA for anyone who deals with it. Here's another link for you that will blow your mind with the length and breadth of issues that can arise with the Excel provider, and how you can (mostly) deal with them in SSIS: Dougbert's blog posts on Excel and SSIS.