poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "DarĂ­o Vasconcelos" <dario.vasconce...@gmail.com>
Subject Re: Conceptual question, using HSSF as a recalculation engine
Date Wed, 07 Jan 2009 15:41:53 GMT
Hi Josh,

thanks for your response and your insight. In my particular case, the XLS
file is too complex, so much that no one knows anymore how it works.
Additionally, the advantage of letting the end user change the business
rules directly in the Excel source file is very attractive.This is why HSSF
seemed like a good idea.

I understand that formula dependency tracking is a complex task. I'm going
to try to reproduce the behavior on a simple XLS file to make it a little
bit easier to do the testing. I'm thinking maybe the actual problem are
named fields, because at my first tests some named fields caused exceptions.

What I mean by "after" an "before" is precisely row/column placement in the
spreadsheet. My guess was that it made no difference, but when all is
failing, anything looks suspicious.

Thanks again for your tips and advice. I might indeed add you to my chat
roster if I need some assistance, thanks.


On Tue, Jan 6, 2009 at 6:33 PM, Josh Micich <josh@gildedtree.com> wrote:

> Hello Dario,
> > So this is my question: is this a good idea? Is POI supposed to do such
> > things?
> POI has supported formula evaluation since version 3.  Specifically
> relating
> to the functionality you are describing, there have been many improvements
> as of late including:
>  - New implementations for missing built-in functions
>  - Fixes for bugs in existing built-in functions
>  - Lazy evaluation of area and reference Evals (speed improvement)
>  - Compact representation of formula Ptg arrays and non-expansion of shared
> formulas (memory improvement)
>  - Dependency-aware formula result caching (speed improvement)
> Whether what you are doing is a good idea is a very subjective question.  A
> java-only based evaluation solution may be better than one that uses (java
> together with) formulas from an XLS file.  Ease of maintenance is a major
> consideration. Java is arguably a better language than Excel formulas.
>  Java
> source is easier to browse and refactor.  Having one less language in use
> also helps for better focus of skills, fewer moving parts and less plumbing
> code.  Another consideration is memory footprint and execution speed. POI
> has improved much lately, but it is still relatively easy to implement
> formulas directly in java that outperform formulas interpreted via POI.
> Having said that, there may be compelling reasons why a POI/XLS based
> evaluation solution is chosen:
>  - The people who maintain the business logic currently in the Excel
> spreadsheet(s) may not comfortable in maintaining that logic anywhere else.
>  - The logic in the spreadsheet may be so large/complex that it is too
> expensive to port (the spreadsheet becomes a 'black box').
> For what it's worth, the company I work for has used POI for exactly this
> kind of solution, so we've gained a lot of insight into some of the
> considerations (beyond core POI functionality).  I'm always happy to chat
> if
> you want to contact me directly.
> > my tests have shown that the recalculation isn't
> > fully working: if I change the value of A1, and A2 and A3 have formulas
> that
> > depend on its value, only A2 is updated but A3 not.
> This sounds like a bug to me.  This is new functionality, so a bug wouldn't
> be surprising.
> Formula dependency tracking was a complex addition to POI (necessary for
> performance).  It works by caching evaluation results for every cell
> evaluated (directly and indirectly).  For each evaluated cell, POI
> remembers
> which cells were used ('sensitive input cells') to calculate the result.
>  As
> each cell is 'used' during a cell evaluation, POI also remembers which
> formula cell(s) are 'consuming' the value.  A client of the formula
> evaluator can notify the evaluator that a cell has changed.  When this
> happens, the formula evaluator recurses up the tree of 'consuming cells'
> and
> clears the cached value and dependency information.  Another subtle
> use-case
> is when a formula cell changes definition.  In this case, the the formula
> evaluator needs to also clear the 'consuming cell' link from all 'sensitive
> input cells' of the changing formula cell.
> As you can imagine, testing this functionality for correctness is tedious.
> There are existing junit tests (TestEvaluationCache) that seem to show very
> similar use-cases working but it is always useful to receive test cases
> that
> show new bugs.  Can you make a simple example which demonstrates the
> behaviour you described?
> > Of course this is a
> > trivial example: in reality, some cells have more than 20 levels of
> > dependency, sometimes the formulas refer to cells that are "before" them,
> > other times refer to cells that are after.
> I am not sure what you mean by "before" and "after" here.  If you are just
> talking about row/column placement in the spreadsheet, this is not a
> problem
> - POI dependency tracking work by unique cell location, and does not
> consider direction.
> >                        evaluator.notifySetFormula(celda);
> >                        evaluator.clearAllCachedResultValues();
> BTW - you never need to call both of these methods together.  There are two
> choices:
>  (a) track every changing input cell and call notifySetFormula() before
> evaluating. If you do this, clearAllCachedResultValues() is not needed.
>  (b) don't bother traking cell changes individually, just call
> clearAllCachedResultValues() after making changes but before evaluating.
> Approach (a) works if you are changing formulas and values of cells one at
> a
> time.  It is also preferred because the performance hit with
> clearAllCachedResultValues() may be large.  On the other hand, if you make
> big changes like adding or removing sheets, or shifting rows, you need to
> use approach (b).
> (Perhaps the javadocs need improving.)
> Hope this helps,
> Josh

Here's a rule I recommend: Never practice two vices at once.
 - Tallulah Bankhead

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message