3
Vote

Error parsing XLSX: missing data from resulting dataset (attached sample file AND FIX)

description

When parsing some XLSX files you can find some empty cells when using AsDataset() method.
(sample file attached: resulting dataset doesn't contain data for column PezziA1).
 
This happens because you are trying to read unexistant data in ExcelOpenXmlReader.ReadSheetRow(XlsxWorksheet sheet)
Wrong (actual) code follows:
                    _xmlReader.ReadToDescendant(XlsxWorksheet.N_v); <-- ERROR!!! (the file contains "<c r=\"O2\" s=\"6\" xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\" />" so there's no descendant)
                    object o = _xmlReader.ReadElementContentAsObject();
 
                    if (null != a_t && a_t == XlsxWorksheet.A_s)
                    {
                        o = _workbook.SST[Convert.ToInt32(o)];
                    }
                    else if (null != a_s)
                    {
                        XlsxXf xf = _workbook.Styles.CellXfs[int.Parse(a_s)];
 
                        if (xf.ApplyNumberFormat && IsDateTimeStyle(xf.NumFmtId) && o != null && o.ToString() != string.Empty)
                        {
                            o = DateTime.FromOADate(Convert.ToDouble(o, CultureInfo.InvariantCulture));
                        }
                    }
 
                    if (col - 1 < _cellsValues.Length)
                        _cellsValues[col - 1] = o;
 
You have to change the code in this way for it to work properly:
                    if (_xmlReader.ReadToDescendant(XlsxWorksheet.N_v))
                    {
                        object o = _xmlReader.ReadElementContentAsObject();
 
                        if (null != a_t && a_t == XlsxWorksheet.A_s)
                        {
                            o = _workbook.SST[Convert.ToInt32(o)];
                        }
                        else if (null != a_s)
                        {
                            XlsxXf xf = _workbook.Styles.CellXfs[int.Parse(a_s)];
 
                            if (xf.ApplyNumberFormat && IsDateTimeStyle(xf.NumFmtId) && o != null && o.ToString() != string.Empty)
                            {
                                o = DateTime.FromOADate(Convert.ToDouble(o, CultureInfo.InvariantCulture));
                            }
                        }
 
                        if (col - 1 < _cellsValues.Length)
                            _cellsValues[col - 1] = o;
                    }
 
Hope this helps someone.
I saw lots of people complaining about missing data on XLSX, this will probably fix it.
If it does solve your problem please let me know.
Bye,
Lino Barreca

file attachments

comments

robalexclark wrote Jun 30, 2011 at 6:47 PM

Thanks for the above info. However this bug is already fixed using the patches provided. I know, I probably need to pull together another formal compiled release, I'll look into doing that now...

jmfronsee wrote Aug 18, 2011 at 12:35 AM

I do not see a Patch in reference to this item# 8969. I downloaded the latest version of the software but some of my columns are also always null (only with XLSX files). Can you please provide a link to the Patch that fixes this issue?