lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Stewart <>
Subject Re: PostingsHighlighter/PassageFormatter has zero matches for some results
Date Tue, 15 Oct 2013 14:52:50 GMT
I'm very grateful for the assistance. It'd be great to know the value
of DEFAULT_MAX_LENGTH in the documentation. I know the majority of
applications care more about precision than recall... but I know of a
lot of people using Lucene for high recall applications, too. Working
in high recall domains doesn't necessarily make us Lucene experts.

Many/most of the maximums/defaults used in Lucene can be changed and
have accessors available, which naturally highlights and documents
them to the user. PostingsHighlighter doesn't have such accessors, and
the treatment of DEFAULT_MAX_LENGTH in the javadocs is brief. I don't
know whether I just flat out missed it or assumed that
DEFAULT_MAX_LENGTH would be big enough, but, FWIW, the docs where
getNumMatches() was 0 on all Passages didn't strike me as being
particularly large.


On Tue, Oct 15, 2013 at 10:11 AM, Robert Muir <> wrote:
> On Tue, Oct 15, 2013 at 9:59 AM, Michael McCandless
> <> wrote:
>> Well, unfortunately, this is a trap that users do hit.
>> By requiring the user to think about the limit on creating
>> PostingsHighlighter, he/she would think about it and realize they are
>> in fact setting a limit.
>> Silent limits are dangerous because you don't offhand know what's
>> wrong / why you see nothing getting highlighted.
> I already made my argument: for 99% of use cases the defaults are
> fine. In most cases highlighting is trying to summarize the document
> and something that deep just doesnt contribute much (see the default
> scoring model!). There is an optional ctor for the others doing expert
> things to specify the length.
> I don't think we should make APIs unusable because you think XYZ is a trap.
> Why not make DEFAULT_MAX_THREAD_STATES a required parameter to indexwriter?
> Hell lets make it so users have to supply all parameters to
> everything, so everything is like
> IndexWriter(int,int,int,int,int,int,int,int,int,int,int,int) and so
> on. Then you will be satisfied there are no traps, but it will be
> totally unusable.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Jon Stewart, Principal
(646) 719-0317 | | Arlington, VA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message