lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Mannix <jake.man...@gmail.com>
Subject Re: Term Boost Threshold
Date Fri, 13 Nov 2009 23:48:25 GMT
On Fri, Nov 13, 2009 at 3:35 PM, Max Lynch <ihasmax@gmail.com> wrote:

> > query: "San Francisco" "California" +("John Smith" "John Smith
> > Manufacturing")
> >
> > Here the San Fran and CA clauses are optional, and the ("John Smith" OR
> > "John Smith Manufacturing") is required.
> >
>
> Thanks Jake, that works nicely.
>
> Now, I would like to know exactly what term was found.  For example, if a
> result comes back from the query above, how do I know whether John Smith
> was
> found, or both John Smith and his company, or just John Smith Manufacturing
> was found?


In general, this is actually very hard.  Lucene does not even keep track
itself
of which terms in a given query matched a given document, but you really
just need to know which terms matched in the final "top hits" you're showing
to the user, right?  What is this information used for / why do you want to
know which term hit?

  -jake


>  The way I am doing that right now is using a highlighter (which
> unfortunately breaks up "John Smith" into <b>John</b><b>Smith</b>)
and
> combining the terms that are to be highlighted and keeping track of them so
> I know they were found.  If there was a simple way to just check which part
> of that query was matched that would be awesome.  This is why I was
> thinking
> of using the term boosting and using a threshold to say "Well, if the score
> is above this value, then I can assume that "John Smith" was found, but if
> the score is under a certain threshold, I can say that only his company was
> found", without having to use the highlighter and noting when a term I'm
> looking for is to be highlighted.  Is there a solution?
>
> Thanks,
> Max
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message