lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Multiword Highlighting
Date Fri, 16 Feb 2007 16:46:00 GMT
It must be time to eat lunch, since the more I stare at this code, the less
sense it makes to me. Which is a sure sign that I need a break <G>.

But a couple of things.....

1> my test cases throw some exceptions with the code as-is. The spans.get(0)
is a problem in that it's not guaranteed that the spans returned will have
anything in them. Also, I don't think that the test for reqSpans.get(0).next
in queryClauses[i].isRequired is correct (even if it doesn't throw
exceptions). Isn't the sense there that we want to include the spans if we
*do* have entries??

2> But more importantly, I think this throws things in the "span bucket"
across documents. Consicer two documents with text "a b c d e f" is in one
document, and "x y z" is in another, and we query on "a AND z", it seems
like extractSpansFromTermQuery would return one span from each document,
which would satisfy the tests in getSpansFromBooleanQuery inappropriately.

Is it just me or is working with Spans really intended to be "one pass
through and only forward"? There are several places in the SpansExtractor
code where we want to ask "are there any spans in here?". But to ask that,
you have to call next(). Which changes the state of the Spans such that you
have to be really careful when you use any Spans that have had this test
performed already and do a do..while (spans.next()); rather than a while (
spans.next()) {}..... Ditto with skipTo.


I'm finally realizing that I need to write more custom stuff here than is
probably useful for the community at large, since I only want to count spans
for a single document. But this is a great start for me since it puts a
bunch of the code in place for me and the rest should probably be just
keeping some lists.....

I'll let y'all know if I come up with anything really interesting....

Erick

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message