lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Post-sort filtering
Date Mon, 04 Feb 2013 18:10:40 GMT
Steve,
this question pops up from time to time, but the answer is usually - no.
This approach is inefficient, and usually proposed as hack/or workaround
made in UI (front end app).
Current patch ruins facets, it filter the same top docs again and again
(i.e. you don't exclude document from the step one from the following
steps), every step costs O(n log p), but lucene support deep scrolling
which made it much more efficient.
AFAIK common way is using Manifold CF to index security filter inside of
Solr.


On Mon, Feb 4, 2013 at 6:20 PM, Steve Molloy <smolloy@opentext.com> wrote:

> BTW, I've logged SOLR-4397 for this and submitted a first patch (based on
> 4.1 tag which is what we use). Need to at least add logic to respect
> timeAllowed, and would like a better way of handling missing results than
> going back and restarting by asking for more, but works for now so guess
> it's a start.
>
> Steve Molloy                              steve.molloy@opentext.com
> Software Architect  |  Information Discovery & Analytics R&D
> OpenText
>
> -----Original Message-----
> From: Steve Molloy [mailto:smolloy@opentext.com]
> Sent: January-24-13 1:16 PM
> To: dev@lucene.apache.org
> Subject: RE: Post-sort filtering
>
> I was actually looking for an extension point to plug in, which I wasn't
> able to find looking at the code. And yes, I'm willing to have counts being
> off, the important thing being that results don't contain the wrong
> document. I'd like to avoid oversampling and requesting back because of the
> bandwidth and overall resource usage this implies. I'm currently trying out
> a "PostSortFilter" approach that I'll share if it seems interesting enough.
>
> Steve Molloy
> Software Architect  |  Information Discovery & Analytics R&D
> Website:
> www.opentext.com
>
>
>
> This email message is confidential, may be privileged, and is intended for
> the exclusive use of the addressee. Any other person is strictly prohibited
> from disclosing or reproducing it. If the addressee cannot be reached or is
> unknown to you, please inform the sender by return email and delete this
> email message and all copies immediately.
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: January-24-13 1:11 PM
> To: dev@lucene.apache.org
> Subject: Re: Post-sort filtering
>
> this has some problems. First, your facet, group, num hits, etc.
> counts will be off for that user. Second, you can't sort without having
> all of the documents, so unless you're willing to have your counts be off,
> you really have to pay the price of post-filtering everything.
>
> If you can live with the counts being off, consider just having the
> application do a couple of round-trips. Get the docs (oversample, say just
> get the IDs for the top 100 docs) _without_ any kind of ACL filtering. Then
> send those docs back to the server with the ACL filtering. If you don't get
> enough to fill up a response, get the next page of 100, etc.....
>
> Finally, the user's list is a better place for this kind of question, this
> list is for discussing developing the code...
>
> Best
> Erick
>
> On Wed, Jan 23, 2013 at 9:05 AM, Steve Molloy <smolloy@opentext.com>
> wrote:
> > Hi,
> >
> >     I'm looking for a way to apply filtering that unfortunately
> > implies high cost because it needs to access external resources (for
> > security). I looked at (and tried) the PostFilter technique, which
> > offers some advantages, but still imply a lot of matches in a lot of
> > cases. What I'd like to be able to do is to filter out ids until I
> > have enough to fill the response, then stop filtering (and accept
> > everything). The idea being that total count is not as important,
> > major thing being results should not contain documents requester
> > should not see. So, post filter almost does the trick, except it comes
> > before sorting, so first X documents are not the same that the post
> filter is getting.
> >
> > Is there a way to filter out documents after they have been scored and
> > sorted?
> >
> > Thanks,
> > Steve
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional
> commands, e-mail: dev-help@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
View raw message