XlsBiffSST:ReadStrings does not read all strings of XLS file.

Jan 4, 2012 at 8:32 PM

I have a large XLS file with one worksheet containing 8 columns and 18216 rows.

When using the ExcelDataReader library I encountered the problem that unique strings are missing starting from row with index 11888.

I use the library as described on this website:

1. ExcelReaderFactory.CreateBinaryReader(stream);

2. DataSet result = excelReader.AsDataSet();

3. Take out table[0] and start iterating over all rows and print column values of each row.

I debugged it a bit and pinpointed the problem has to be in the algorithm of the method: XlsBiffSST:ReadStrings because the method: XlsBiffSST:GetString returned string.empty due to the fact that the index was out of range of the m_strings array.

Due to the fact that my knowledge of this lib is very low and the ReadStrings method is pretty complex I would like to ask if someone can take a look at this problem an come up with a fix.

 

 

 

public string GetString(uint SSTIndex)
{
if (SSTIndex < m_strings.Count)
return m_strings[(int)SSTIndex];
return string.Empty;
}
Jun 21, 2012 at 2:12 PM
Edited Jun 21, 2012 at 2:12 PM

I'm not sure if you ever got this fixed, but I had the same issue come up this week.  I have commit version 59893.

After a day of debugging, the problem for me appears to be the commented code in the HasExtString property of the XlsFormattedUnicodeString.cs file in the Code/BinaryFormat folder.  Even when the cell should have an extended string, it was always returning false, causing value issues with any field that had an extended string.  Once one of these was encountered, the rest of the file would be be read properly, and the offset would become greater than the last bit index, and return out of the ReadString method.  Long story short, modifying the following code (starting at line 48 in Excel/Code/BinaryFormat/XlsFormattedUnicodeString.cs):

 

		/// <summary>
		/// Checks if string has Extended record
		/// </summary>
		public bool HasExtString
		{
		    get { return false; }
                    // ((Flags & FormattedUnicodeStringFlags.HasExtendedString) == FormattedUnicodeStringFlags.HasExtendedString); }
		}

 

to be:

 

		/// <summary>
		/// Checks if string has Extended record
		/// </summary>
		public bool HasExtString
		{
		    //get { return false; }
                    get { return ((Flags & FormattedUnicodeStringFlags.HasExtendedString) == FormattedUnicodeStringFlags.HasExtendedString); }
		}

 

resolved the issue for me.  Hopefully this saves some headaches for somebody else down the road.

Oct 11, 2013 at 9:59 PM
Sadly I just spent a day of debugging on this as well. Found that commented out flag test and had a WTF moment.

Un-commented the test ( after learning way too much about BIFF record structures ) and my failing spreadsheets all work.

I don't have SVN install anymore, and can't make a SVN patch for this. If someone reads this, please make a patch to enable the flag test and submit it to the devs.