poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 54283] New: Very slow opening XSSF spreadsheets
Date Wed, 12 Dec 2012 11:08:51 GMT
https://issues.apache.org/bugzilla/show_bug.cgi?id=54283

            Bug ID: 54283
           Summary: Very slow opening XSSF spreadsheets
           Product: POI
           Version: 3.9-dev
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XSSF
          Assignee: dev@poi.apache.org
          Reporter: jan.stette@gmail.com
    Classification: Unclassified

Created attachment 29746
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=29746&action=edit
Screenshot from profiler, showing large number of calls when opening
spreadsheet

I'm seeing very slow load times for XSSF spreadsheets. One example spreadsheet
takes several minutes to open, whereas the same spreadsheet same as an xls/HSSF
opens in a fraction of a second.

This is about the same spreadsheets mentioned in bug 54282, but this one looks
harder to fix.

Attached is a screenshot from a profiler session that highlights the problem.
Basically, when opening a single spreadsheet that contains two sheets, there is
behaviour that looks like it's O(N^3) with respects to the number of columns in
the spreadsheet. The sequence is roughly as follows:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
<...>
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 5,317 calls to ColumnHelper.sortColumns()
- 7,812,463 calls to Xobj.find_element_user() 
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

There's a similar bottleneck that goes like this:

- 1 call to XSSFWorkbook.onDocumentRead()
- 2 calls to XSSFSheet.onDocumentRead()
- 2 calls to ColumnHelper.cleanColumns()
- 5,317 calls to ColumnHelper.addCleanColIntoCols()
- 7,807,146 calls to ColumHelper.getColArray()
- 8,243,994,339 calls to Xobj.isElem() and QName.equals().

I realise that this code bottoms out in in XMLBeans so maybe this is partly an
issue there. I did find this bug report on XMLBeans which sounds relevant, but
it's been open for a couple of years:
https://issues.apache.org/jira/browse/XMLBEANS-438

Still, I wonder if there's something that could be done in the POI code to
avoid hitting the XMLBeans data structures so hard. As it is, it unfortunately
renders the API unusable for certain spreadsheets.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message