poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Studer" <martin.stu...@mirai-solutions.com>
Subject Performance issue with saving/writing large xlsx files
Date Fri, 05 Aug 2011 10:45:21 GMT
Dear Developers,

 

I've recently encountered a performance issue when writing large xlsx files
(~ 10^5 rows and about 10 columns). Looking at the POI mailing list I found
a discussion that was going in May/June 2010:

http://apache-poi.1045710.n5.nabble.com/Performance-Question-with-CTSheetDat
aImpl-java-td2335065.html

 

The thread describes some improvements that were implemented to check
whether the CTRows have to be reordered or not
(XSSFSheet#ensureRowOrdering). However, from what I've found, already the
checks that have to be performed to see if rows have to be reordered are
quite expensive - especially the calls to  sheetData.getRowArray. I
understand that adapting the XMLBeans side of things is not straight forward
/ practical. However, the thread also describes an alternative approach
formulated by Yegor which is "to improve XSSFSheet#createRow so that the
array of CTRow beans is kept in a sorted state". To me this seems like a
very good idea. Looking at the current codebase it seems like this has not
been implemented so far, probably due to other side-effects such a change
would cause? Given that the sheet-internal _rows object already is an
ordered TreeMap, Yegor's alternative approach could probably be implemented
like this:

 

public XSSFRow createRow(int rownum) {

        CTRow ctRow;

        XSSFRow prev = _rows.get(rownum);

        if(prev != null){

            ctRow = prev.getCTRow();

            ctRow.set(CTRow.Factory.newInstance());

        } else {

                    if(rownum > _rows.lastKey()) {

                                // we can append the new row at the end

                                ctRow =
worksheet.getSheetData().addNewRow();

                    }

                    else {

                                // random access case ...

                                // get number of rows where row index <
rownum

                                // --> this tells us where our row should go

                                int idx = _rows.headMap(rownum).size();

                                ctRow =
worksheet.getSheetData().insertNewRow(idx);

                    }

        }

        XSSFRow r = new XSSFRow(ctRow, this);

        r.setRowNum(rownum);

        _rows.put(rownum, r);

        return r;

    }

 

 

In case the row does not exist yet, we check whether the row to be inserted
is the "highest" row (= has the largest row number). In this case we can
simply append the row to the CTSheetData via the addNewRow method. Otherwise
we check how many rows there are which have a smaller index. Given the
invariant that the CTRows are ordered, this number tells us where we have to
insert the new row (via the insertNewRow method).

 

What is your opinion on that? Do you think this would work without causing
other issues? Note, I have not been testing this as I'm still trying to
figure out how set up the checked out POI code with Eclipse (if somebody has
some pointers on how to do that I would be very greatful).

 

Have a good weekend,

Martin

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message