poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject [Bug 61841] Unnecessary long computation when evaluating VLOOKUP on all column reference
Date Wed, 06 Dec 2017 00:19:12 GMT
https://bz.apache.org/bugzilla/show_bug.cgi?id=61841

--- Comment #5 from Greg Woolsey <gwoolsey@apache.org> ---
Changes in r1817252

Interesting.  In a local test with the attached sample file, I found these
results:

45s (second run) 
with current codebase issuing FormulaEvaluator.evaluateAll() on the workbook.

19s 
By just changing XSSFEvaluationSheet.getCell(row, col) to immediately return
null if the row index > sheet.getLastRowNum() 

14.4s
when XSSFEvaluationSheet caches the value of getLastRowNum(), since it comes
from a TreeMap.lastKey() which has to navigate the tree each time to find the
last key.

12.1s
after optimizing the blank cell tracking a bit to know about the last row with
data.

That's all without changing anything int he VLOOKUP evaluation and still
iterating over the max # of rows per column.

Of this remaining time, about 2/3 is taken up in the formula evaluation caching
and tracking mechanism.  Bypassing it for null cells causes test failures,
which shows it is necessary, but relatively expensive.  It appears to try to
optimize and minimize the "empty cell" rectangular regions it holds. but
assumes processing by row then column.  That may be a memory/time optimization
we want to consider allowing additional strategies for.

Note that this shortcut logic doesn't change the result of any methods, only
avoids busywork that didn't apply to the "nonexistent cell" cases.

This doesn't optimize VLOOKUP directly, but is about 70% improvement
sufficient?

Changing the VLOOKUP code itself is actually significantly more complex,
because POI handles sheets by row internally, and columns are second-class
constructs.  There is no easy way to determine the last row with data in a
column other than iterating over all defined rows.  With these optimizations,
the extra iterations should fail fast.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message