lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adair Kovac <adairko...@gmail.com>
Subject Re: Paging bug in ReRankingQParserPlugin?
Date Tue, 05 Aug 2014 16:04:01 GMT
Thanks, great explanation! Yeah, if it keeps the current behavior added
documentation would be great.

Are there any other features that expect parameters to change as one pages?
If not I'm concerned that it might be hard to support for clients that
assume only the index params will change. It also makes it harder to work
if we want to add re-ranking on a strict small set of results on the first
page, because then we'd have to stitch together two result sets. We don't
currently want to do that, though.

For what it's worth, what my colleague who linked me the feature and I both
assumed the behavior would be is that it would get all the results and
return the ones past the re-ranking point as-is. Is that possible?

Thanks,

Adair




On Tue, Aug 5, 2014 at 5:53 AM, Joel Bernstein <joelsolr@gmail.com> wrote:

> The comment in the code reads slightly different:
>
> // This enusres that reRankDocs >= docs needed to satisfy the result set.
> reRankDocs = Math.max(start+rows, reRankDocs);
>
> I think you're right though that this is confusing. The way the
> ReRankingQParserPlugin works is that it grabs the top X documents
> (reRankDocs) and reRanks them. If the top X (reRankDocs) isn't large enough
> to satisfy the page then the result won't have enough documents.
>
> The intended use of this was actually to stop using query re-ranking when
> you paged past the reRanked results. So if you re-rank the top 200
> documents, you would drop the re-ranking parameter when you page to
> documents 201-220.
>
> So the line:
> reRankDocs = Math.max(start+rows, reRankDocs);
>
> Saves you from an unexpected shortfall in documents if you do page beyond
> the reRankDocs. At the very least the expected use should be documented and
> if we can figure out better behavior here that would be great.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> Search Engineer at Heliosearch
>
>
> On Mon, Aug 4, 2014 at 7:56 PM, Adair Kovac <adairkovac@gmail.com> wrote:
>
>> Looking at this line in the code:
>>
>> // This enusres that reRankDocs <= docs needed to satisfy the result set.
>> reRankDocs = Math.max(start+rows, reRankDocs);
>>
>> This looks like it would cause skips and duplicates while paging through
>> the results, since if you exceed the reRankDocs parameter and keep finding
>> things that match the re-ranking query, they'll get boosted earlier
>> (skipped), thus pushing down items you already saw (causing duplicates).
>>
>> It's obviously intentional behavior, but there's no documentation I can
>> see of why, if you request fewer documents to be re-ranked than you're
>> asking to view, it goes ahead and ignores the number you asked for. What if
>> I only want the top 10 out of 50 rows to be reranked? Wouldn't it be better
>> to make the client choose whether to increase the reRankDocs or leave it
>> the same?
>>
>> If no one replies and I have time, I might check out 4.9 and see if I can
>> confirm or disprove the bug, but figured I'd bring it up now in case I
>> don't end up having time. It would be good to document the reason for this
>> behavior if it turns out it's necessary.
>>
>> Thanks. I'm excited about this feature btw.
>>
>> --Adair
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message