lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Phrase Highlighting
Date Thu, 30 Apr 2009 10:16:28 GMT
On Thu, Apr 30, 2009 at 12:15 AM, Max Lynch <ihasmax@gmail.com> wrote:
> You should switch to the SpanScorer (in o.a.l.search.highlighter).
>> That fragment scorer should only match true phrase matches.
>>
>> Mike
>>
>
> Thanks Mike.  I gave it a try and it wasn't working how I expected.  I am
> using pylucene right now so I can ask them if the implementation is
> different.  I'm messing around with the lucene unit tests to see exactly how
> the scorer should work.

Can you give more details on what's not working right?

> In the mean time, If I am interested in finding out exactly how many times a
> term was found in a document, what is the best way to go about this?  The
> way I am doing it right now is using a highlighter and just incrementing
> counters when a word is found that I'm interested.  I just came across
> FieldSortedTermVectorMapper that could do something similar.  Is
> FieldSortedTermVectorMapper something I could use for this?  Is there a
> better option?

This (getting more details per-doc on why a doc matches) is an often
requested feature, that Lucene does not make easy today.

Is it really just single terms you need to measure?  (eg, not "how
many times did phrase XYZ occur in the doc").  If so, then getting the
term vectors and locating your term in there, should work.  This is
probably OK if you just do it for each of the hits on the page (like
10 hits), but will be way too slow if you try to do it for say all
docs that matched the query.

Otherwise, iterating through matches as found by the highlighter's
SpanScorer is probably best for now.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message