poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nokleberg <ch...@sixlegs.com>
Subject Re: Request for feedback : More Shared Formula thoughts
Date Tue, 14 Oct 2003 17:56:49 GMT
On Tue, Sep 09, 2003 at 10:51:34PM +1000, Glen Stampoultzis wrote:
> At 09:54 AM 9/09/2003, you wrote:
> >In my spare time I have been working on a new approach to the whole
> >issue that builds upon the RandomAccessFile capabilities of POIFS2. The
> >INDEX and DBCELL records are used to locate arbitrary cells in the sheet
> >and cells are read lazily by seeking to the proper location and copying
> >only the necessary records. There is an adjustable cache of recently
> >read records which helps to keep the seeking to a minimum. The end
> >result is that you can process arbitrarily large sheets using a small
> >and essentially constant amount of memory. Naturally by controlling
> >memory use the speed is greatly improved as well.
> Do you have some performance numbers?

I found some time to finish my "technology demo". The testcase was
reading in a 160K excel spreadsheet and summing an entire column. The
POI code I used was:

    private static double sumcolumn2(String filename, int column)
    throws IOException
        short c = (short)column;
        FileInputStream fis = new FileInputStream(filename);
        POIFSFileSystem fs = new POIFSFileSystem(new BufferedInputStream(fis));
        HSSFWorkbook wb = new HSSFWorkbook(fs);
        HSSFSheet sheet = wb.getSheetAt(0);
        double total = 0;
        for (int r = sheet.getFirstRowNum(), max = sheet.getLastRowNum(); r <= max; r++)
            HSSFRow row = sheet.getRow(r);
            if (row != null) {
                HSSFCell cell = row.getCell(c);
                if (cell != null && cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC)
                    total += cell.getNumericCellValue();
        return total;

Running this loop 100 times on my machine took 15 seconds, and the
rewritten version only 800ms, about a 20x speedup. It used POIFS2 with a
FileChannel (NIO) source and used the INDEX and DBCELL records to lazily
read in cell data as I described above.

Maybe for POI 7.0...


To unsubscribe, e-mail: poi-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-dev-help@jakarta.apache.org

View raw message