lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Multiword Highlighting
Date Fri, 16 Feb 2007 00:43:20 GMT
Mark:

Thanks, that reassures me that I'm not hallucinating. If it gets on my
priority list I can certainly share the code, since I stole it in the first
place <G>. I have a semi-solution for now that gets me out from under the
immediate problem, but it really wants a more robust solution than the one
I'm using.

Thanks
Erick

On 2/15/07, Mark Miller <markrmiller@gmail.com> wrote:
>
> Good catch Erick! I'll have to tackle this as well. Mark H is the
> originator of that code so maybe he will chime in, but what I am think
> is this:
>
> In the getSpansFromBooleanquery, keep track of which clauses are
> required. Then based on if any Spans are actually returned from
> getSpansFromTerm for each required clause, add only the correct spans to
> the returned spans. If you get what I mean <g>. I am sure there are some
> more cases than that to consider, but I think the direction might work.
>
> If you don't tackle it or can't share I'll be doing it myself.
>
> - Mark
>
> Erick Erickson wrote:
> > I hope you're all following this old thread, because I've just run into
> > something I don't quite know what to do about with the SpansExtractor
> > code
> > that I shamelessly stole.
> >
> > Let's say my text is "a b c d e f g h" and my query is "a AND z". The
> > implementation I stole for SpansExtractor (mentioned several times in
> > this
> > thread) returns a span for "a" which doesn't preserve the sense of the
> > query. The root of the problem is that when it gets down to assembling
> > the
> > getSpansFromTermQuery, the sense of "AND" is lost and I get span for
> > the "a"
> > in the query.
> >
> > The rest of the kinds of spans don't seem to have the same issue. OR
> > should
> > return the "a" in the example above. Any phrase queries that come
> through
> > work fine. In fact, our application requires that we have an implied
> > proximity, mostly anyway, so I haven't had to deal with this until
> > now.....
> >
> > One way, it seems to me, to handle this would be to transform the query
> > above into a span query with a limit of 10,000, where 10,000 is a magic
> > number that I'm confident is OK in my application because of the
> > PositionIncrementGaps I set up during indexing.
> >
> > Is there a more elegant way of doing this? Or am I missing the boat
> > entirely? Or did I mess up when I stole the code?
> >
> > Or, and this would be the easiest for me at least, has this work already
> > been done and all I really need to do is get a different
> > implementation of
> > SpansExtractor <G>?
> >
> > Thanks
> > Erick
> >
> >
> > On 2/2/07, Mark Miller <markrmiller@gmail.com> wrote:
> >>
> >>
> >>
> >> mark harwood wrote:
> >> > Hi Mark,
> >> > Have you looked at the returned spans from any other potential
> problem
> >> scenarios (other than the 3 word one you suggest) e.g. complex nested
> >> "SpanOr" or "SpanNot" logic?
> >> >
> >> Nothing super intense, but I haved look at some semi complex nesting
> and
> >> it all looks great if you use the full span highlighting...highlighting
> >> the first and last word of the span only works great if your limited to
> >> word to word proximity searching (like in my parser <G> works great for
> >> my sentence and paragraph proximity searching, though i had to add the
> >> option of hiding my index marker tokens from the output)
> >>
> >> Perhaps you know of something that I haven't run into that may not
> >> highlight correctly ?
> >> > Can you attach your code to a new Jira entry so I can have a play?
> >> >
> >> >
> >> I certainly will.
> >>
> >> - Mark
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message