lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <...@lightboxtechnologies.com>
Subject Re: PostingsHighlighter/PassageFormatter has zero matches for some results
Date Tue, 15 Oct 2013 14:52:50 GMT
I'm very grateful for the assistance. It'd be great to know the value
of DEFAULT_MAX_LENGTH in the documentation. I know the majority of
applications care more about precision than recall... but I know of a
lot of people using Lucene for high recall applications, too. Working
in high recall domains doesn't necessarily make us Lucene experts.

Many/most of the maximums/defaults used in Lucene can be changed and
have accessors available, which naturally highlights and documents
them to the user. PostingsHighlighter doesn't have such accessors, and
the treatment of DEFAULT_MAX_LENGTH in the javadocs is brief. I don't
know whether I just flat out missed it or assumed that
DEFAULT_MAX_LENGTH would be big enough, but, FWIW, the docs where
getNumMatches() was 0 on all Passages didn't strike me as being
particularly large.


Jon

On Tue, Oct 15, 2013 at 10:11 AM, Robert Muir <rcmuir@gmail.com> wrote:
> On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>> Well, unfortunately, this is a trap that users do hit.
>>
>> By requiring the user to think about the limit on creating
>> PostingsHighlighter, he/she would think about it and realize they are
>> in fact setting a limit.
>>
>> Silent limits are dangerous because you don't offhand know what's
>> wrong / why you see nothing getting highlighted.
>>
>>
>
> I already made my argument: for 99% of use cases the defaults are
> fine. In most cases highlighting is trying to summarize the document
> and something that deep just doesnt contribute much (see the default
> scoring model!). There is an optional ctor for the others doing expert
> things to specify the length.
>
> I don't think we should make APIs unusable because you think XYZ is a trap.
>
> Why not make DEFAULT_MAX_THREAD_STATES a required parameter to indexwriter?
>
> Hell lets make it so users have to supply all parameters to
> everything, so everything is like
> IndexWriter(int,int,int,int,int,int,int,int,int,int,int,int) and so
> on. Then you will be satisfied there are no traps, but it will be
> totally unusable.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>



-- 
Jon Stewart, Principal
(646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message