lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-6066) New "remove" method in PriorityQueue
Date Thu, 20 Nov 2014 20:38:36 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-6066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219950#comment-14219950
] 

Michael McCandless commented on LUCENE-6066:
--------------------------------------------

Thanks Mark, that makes sense, and that's a great example to help understand it.  The fact
that diversity doesn't need to keep the "filler" is what allows it to be a single pass.

If we have this linear cost remove, what's the worst case complexity?  When all N hits have
the same "key" but are visited from worst to best score?  Is it then O(N * M), where M is
number of top hits I want?

bq. If the PQ set the current array position as a property of each element every time it moved
them around I could pass the array index to remove() rather than an object that had to be
scanned for

This seems promising, maybe as a separate dedicated (forked) PQ impl?  But how will you track
the min element for each key in the PQ (to know which element to remove, when a more competitive
hit with that key arrives)?

> New "remove" method in PriorityQueue
> ------------------------------------
>
>                 Key: LUCENE-6066
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6066
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/query/scoring
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 5.0
>
>         Attachments: LUCENE-PQRemoveV1.patch
>
>
> It would be useful to be able to remove existing elements from a PriorityQueue. 
> The proposal is that a linear scan is performed to find the element being removed and
then the end element in heap[size] is swapped into this position to perform the delete. The
method downHeap() is then called to shuffle the replacement element back down the array but
the existing downHeap method must be modified to allow picking up an entry from any point
in the array rather than always assuming the first element (which is its only current mode
of operation).
> A working javascript model of the proposal with animation is available here: http://jsfiddle.net/grcmquf2/22/

> In tests the modified version of "downHeap" produces the same results as the existing
impl but adds the ability to push down from any point.
> An example use case that requires remove is where a client doesn't want more than N matches
for any given key (e.g. no more than 5 products from any one retailer in a marketplace). In
these circumstances a document that was previously thought of as competitive has to be removed
from the final PQ and replaced with another doc (eg a retailer who already has 5 matches in
the PQ receives a 6th match which is better than his previous ones). This particular process
is managed by a special "DiversifyingPriorityQueue" which wraps the main PriorityQueue and
could be contributed as part of another issue if there is interest in that. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message