poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Woolsey <greg.wool...@gmail.com>
Subject Re: Evaluating expressions outside a cell value context
Date Mon, 13 Feb 2017 01:07:58 GMT
Thanks for taking the time to inspect it, and point me to the utilities, I
hadn't found them yet.  I'll incorporate them and run all the tests
including my new ones.

Are folks OK with this as part of 3.16?  I'd like that.  There are some new
methods on some interfaces, so if there are custom implementations of
things like Sheet, they would need changes.  Returning null or false should
be fine though, as the only callers are the new evaluator classes, and
anyone using those would need the new API methods anyway.

On Sun, Feb 12, 2017, 5:22 PM Javen O'Neal <javenoneal@gmail.com> wrote:

> I haven't read through all your changes on GitHub yet, but all the
> changes so far look good. I have a few other suggestions to
> deduplicate some code using SheetUtil and FormulaShifter, but those
> changes can be made at a later date if needed.
>
> You should be able to use git-svn to push your changes. Read through
> and improve our git documentation [1] if necessary.
>
> [1] https://poi.apache.org/guidelines.html#Approach+3+-+the+git+way
>
> On Fri, Feb 10, 2017 at 12:10 AM, Greg Woolsey <greg.woolsey@gmail.com>
> wrote:
> > Well, I couldn't stand the incomplete support, so now this supports
> > evaluating rules for all the different types, including range aggregates
> > like "greater than 2 standard deviations" and "top 10".  Still doesn't
> > provide help assigning partitioning buckets for icon sets and colors, but
> > everything else is working.
> >
> > I filed a big bug with Vaadin, listing 5 core design problems I've found
> > with their Conditional Formatting implementation, and offering my
> > replacement for their code that uses the new POI evaluator instead.  They
> > bit, and are interested, but I won't make my first commit some behemoth
> > that hasn't received any feedback.  I know there are conventions and
> ideas
> > I've missed :)
> >
> >  I need both sets of changes for my day job, so I'm all-in on doing it
> > right in both directions and facilitating the conversations.
> >
> > On Tue, Feb 7, 2017 at 11:50 PM Greg Woolsey <greg.woolsey@gmail.com>
> wrote:
> >
> >> Now this fork also contains ConditionalSpreadsheetEvaluator, and related
> >> code.  The unit test is essentially a stub, but tests one basic style
> for
> >> proof of concept.
> >>
> >> I've actually implemented a version of Vaadin Spreadsheet that uses this
> >> new code to see how it performs, and I'm quite happy with both the
> improved
> >> performance (~50% faster than theirs) and feature coverage/accuracy.
> I've
> >> found 5 major bugs so far in what they did, most likely the result of
> the
> >> complexity of the document structure and the fact that several key
> pieces
> >> of information where still buried in the implementation classes, and
> hadn't
> >> been surfaced yet to the SS interfaces.  I've done that in this branch
> also.
> >>
> >> My code here is my own, I didn't like anything I saw elsewhere enough to
> >> copy it :)
> >>
> >> Evaluation currently doesn't support range-based conditions, such as
> >> TOP_10, DUPLICATE, etc.  Those don't seem like they'd be that bad to
> do, if
> >> someone wants to take a stab at them.  I don't need them (yet), so they
> >> just evaluate to "false" with a TODO comment for now.
> >>
> >> Likewise, there is no code to report which partition bucket a cell falls
> >> into when the condition type is one of the partitioned styles, 2,3 or 4
> >> value buckets, gradient fill, etc.  The fact that the rule matches
> (based
> >> on range) is available, the caller would need to evaluate the rule type
> and
> >> see what lies beneath.
> >>
> >> I assume interested parties will take a look as they have time and
> >> inclination.  I'm sure there are areas to discuss, beyond where to put
> the
> >> curly braces :)  I left some comments as to alternate strategies for
> some
> >> areas, where I opted for less change to existing classes as a starting
> >> point, even if it means a switch...case here or there when a new method
> >> could be added to an Enum class instead.
> >>
> >> Hopefully the new methods on the SS interfaces are deemed minor - the
> >> values were already there in most cases, at least on one side or the
> other
> >> (HSSF/XSSF), with a static default to use for the other one per MS
> >> documentation.
> >>
> >> Greg
> >>
> >> On Wed, Feb 1, 2017 at 5:38 PM Greg Woolsey <greg.woolsey@gmail.com>
> >> wrote:
> >>
> >> Oh, the primary class is o.a.p.ss.formula.DataValidationEvaluator
> >>
> >> On Wed, Feb 1, 2017 at 5:37 PM Greg Woolsey <greg.woolsey@gmail.com>
> >> wrote:
> >>
> >> My GitHub branch now contains Data Validation code and unit tests.  The
> >> test file DataValidationEvaluations.xlsx contains a large set of
> validation
> >> examples, including one formula example that applies to a range of cells
> >> and uses a relative formula.  The evaluation code has corresponding
> logic
> >> to offset the relative formula Ptgs from the top left of the region.
> >>
> >> Every test is labeled in the file with column A as a description,
> column B
> >> as the cell with validation, and column C the expected result, TRUE =
> >> valid, FALSE = invalid.
> >>
> >> The unit test compares the POI validation result with the expected
> column,
> >> failing on boolean mismatches.
> >>
> >> Have not had time to run all tests yet, but this should only be code
> >> additions, not modifications.  I'll run them soon.
> >>
> >> I'm sure there are code style discussions to be had - for example I
> >> implemented some things as inner classes for now, but we may want them
> >> top-level instead.
> >>
> >> Comments welcome, this is early code but is built on top of the SS
> >> interfaces, so should be stable for HSSF and XSSF.
> >>
> >> Greg
> >>
> >> On Mon, Jan 30, 2017 at 9:55 AM Greg Woolsey <greg.woolsey@gmail.com>
> >> wrote:
> >>
> >> Also, I just found this sample workbook
> >> <
> http://download.microsoft.com/download/1/6/F/16F701E9-63BA-48D3-8B48-096F9288F443/AF010235700_en-us_cfsamples_af010235700.xlsx>
> in
> >> the Excel online support docs.  If I have time to turn that into a unit
> >> test, it's about as complete as we could want.  Some parts are lost
> saving
> >> as HSSF, but we can then test that we evaluate what remains the same
> way as
> >> newer Excel when opening a legacy formatted file.
> >>
> >> On Mon, Jan 30, 2017 at 9:38 AM Greg Woolsey <greg.woolsey@gmail.com>
> >> wrote:
> >>
> >> Thanks, that makes sense wrt custom implementations of FormulaEvaluator
> -
> >> I hadn't thought about anyone rolling their own, but it's an interface,
> so
> >> quite possible.  Too bad we can't require Java 8 yet and use default
> >> methods.
> >>
> >> I can work with the new *Evaluator class idea.  And the HSSF limitations
> >> will just mean more unit tests :)  I have Excel 2016 available so I can
> >> create test workbooks, save them as both XLSX and XLS, and compare
> >> evaluations.  I can then write unit tests based on them that expect the
> >> results seen in Excel.  That should give us reference points for
> confidence
> >> in our replication of their logic, especially around rule priority/order
> >> and XLS HSSF files.
> >>
> >> On Fri, Jan 27, 2017 at 11:05 PM Nick Burch <apache@gagravarr.org>
> wrote:
> >>
> >> On Sat, 28 Jan 2017, Greg Woolsey wrote:
> >> > As noted in one of the method JavaDocs, we also need to expose and
> make
> >> use
> >> > of the ConditionalFormattingRule "priority" attribute.  That's key to
> >> > matching the right rule when more than one rule applies to a cell.
> Only
> >> > the first match in priority order is applied.
> >>
> >> Your slight challenge is that not all Conditional Formatting rules have
> a
> >> priority... XLSX ones do, and newer XLS ones based on CFRule12Record
> (sid
> >> = 0x087A) do, but the older XLS ones (CFRuleRecord / 0x01B1) don't. I'm
> >> not sure what Excel does for those, but my hunch (based on our API) is
> >> that it uses their order as a priority.
> >>
> >>
> >> > I've created a fork in GitHub for this, and committed a stab at
> >> > high-level API methods that could be added to the FormulaEvaluator
> >> > interface:
> >> >
> >>
> https://github.com/WoozyG/poi/commit/d44fee7bd03ed450af589467ec90e2581b9f2b16$
> >>
> >> FormulaEvaluator is an interface, which we have 4 implementations of in
> >> our codebabse, and I'd guess that other complex users of POI will have
> >> dozens more. I'm not sure, therefore, that we want to be putting all of
> >> the CF and DV logic methods on there, especially as it'll be common to
> all
> >> implementations
> >>
> >> The HSSF classes for CF all use org.apache.poi.ss.formula.Formula which
> is
> >> PTG based. The HSSF classes for DV seem to store the raw PTGs.
> >>
> >> If we added two new SS usermodel classes, eg
> >> ConditionalFormattingEvaluator and DataValidationEvaluator, these could
> be
> >> classes (not interfaces) with your proposed new methods on. They could
> >> hold the logic (once) for all formats (as it's basically the same on
> all)
> >> for priority, checking etc
> >>
> >> Doing that would also mean that "our" new classes could call out to our
> >> existing low-level ones to evaluate formulas. That would mean we
> wouldn't
> >> have to make a breaking change to the FormulaEvaluator interface too
> >>
> >> Might that work for you?
> >>
> >> > No implementations have been done yet, and the Vaadin comments
> indicate
> >> > HSSF doesn't parse conditional formatting properly or something, and
> >> can't
> >> > be evaluated correctly currently.  I don't know exactly what they
> found
> >> > wrong, and it's rather annoying they didn't file any bugs.
> >>
> >> I think that comment is out of date, from before the CF work in 3.13
> >>
> >> Nick
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> >> For additional commands, e-mail: dev-help@poi.apache.org
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
> For additional commands, e-mail: dev-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message