lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: PostingsHighlighter/PassageFormatter has zero matches for some results
Date Tue, 15 Oct 2013 13:42:42 GMT
I strongly disagree: there is no trap, its a reasonable default for
good summarization, and the behavior is no different than the other
highlighters here.

Typically people *do* care about performance and its important to have
a clean simple API too.

In my opinion increasing this limit is very esoteric: usually
sentences that deep do not summarize the document well.



On Tue, Oct 15, 2013 at 9:38 AM, Michael McCandless
<lucene@mikemccandless.com> wrote:
> Maybe we should make the max length a required argument to
> PostingsHighlighter ctor?
>
> Because it's trappy now, since you don't realize offhand that it's
> silently enforcing a limit ...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Tue, Oct 15, 2013 at 9:31 AM, Robert Muir <rcmuir@gmail.com> wrote:
>> Thanks Jon. Ill add some stuff to the javadocs here to try to make it
>> more obvious.
>>
>> On Tue, Oct 15, 2013 at 5:54 AM, Jon Stewart
>> <jon@lightboxtechnologies.com> wrote:
>>> Awesome, that did it! I didn't realize that DEFAULT_MAX_LENGTH was
>>> only 10,000. I've now upped it to 16MB (I'm not doing the usual thing
>>> and performance is not a particular concern).
>>>
>>> Thanks,
>>>
>>> Jon
>>>
>>>
>>> On Mon, Oct 14, 2013 at 9:58 PM, Robert Muir <rcmuir@gmail.com> wrote:
>>>> are your documents large?
>>>>
>>>> try PostingsHighlighter(int) ctor with a larger value than DEFAULT_MAX_LENGTH.
>>>>
>>>> sounds like the passages you see with matches are very deep into the
>>>> document and its just hitting the default limit and returning the
>>>> default summarization (getEmptyHighlight())
>>>>
>>>> otherwise, please open a JIRA issue :)
>>>>
>>>> On Mon, Oct 14, 2013 at 9:32 PM, Jon Stewart
>>>> <jon@lightboxtechnologies.com> wrote:
>>>>> I upgraded to 4.5. Same results, unfortunately. Most docs in the
>>>>> result set will have a Passage where numMatches() > 0, but some do
>>>>> not. In these cases, the Passage array's length is greater than zero.
>>>>>
>>>>>
>>>>> Jon
>>>>>
>>>>>
>>>>> On Mon, Oct 14, 2013 at 5:24 PM, Robert Muir <rcmuir@gmail.com>
wrote:
>>>>>> did you try the latest release? There are some bugs fixed...
>>>>>>
>>>>>> On Mon, Oct 14, 2013 at 2:11 PM, Jon Stewart
>>>>>> <jon@lightboxtechnologies.com> wrote:
>>>>>>> Hello,
>>>>>>>
>>>>>>> I've observed that when using PostingsHighlighter in Lucene 4.4
that
>>>>>>> some of the responsive documents in TopDocs will have zero matches
in
>>>>>>> the associated array of Passage objects. I.e., in the call of
>>>>>>> PassageFormatter.format(), there will be some calls where none
of the
>>>>>>> Passage objects in the array will have matches. I've seen this
on a
>>>>>>> simple one-word query, where the word clearly exists in the Document's
>>>>>>> text for the field (and the Document is included in the TopDocs
result
>>>>>>> set).
>>>>>>>
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jon
>>>>>>> --
>>>>>>> Jon Stewart, Principal
>>>>>>> (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jon Stewart, Principal
>>>>> (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>>
>>>
>>> --
>>> Jon Stewart, Principal
>>> (646) 719-0317 | jon@lightboxtechnologies.com | Arlington, VA
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message