Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3AF78EBE3 for ; Thu, 28 Feb 2013 17:39:19 +0000 (UTC) Received: (qmail 1839 invoked by uid 500); 28 Feb 2013 17:39:15 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 1768 invoked by uid 500); 28 Feb 2013 17:39:15 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 1748 invoked by uid 99); 28 Feb 2013 17:39:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2013 17:39:15 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hebert.colin@gmail.com designates 209.85.217.175 as permitted sender) Received: from [209.85.217.175] (HELO mail-lb0-f175.google.com) (209.85.217.175) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Feb 2013 17:39:09 +0000 Received: by mail-lb0-f175.google.com with SMTP id n3so1559817lbo.6 for ; Thu, 28 Feb 2013 09:38:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=9gRgXUkBjBzvm8hg9R6nmZ952rBGvouNZGy2ZQxD4OY=; b=q7lZTopLTTOXBj3PvWYmTgv6pUplRUCLm5NERTSbgbw/QrU673qcc8h4idvth3xBla UrTEZLgwwoAJp0sCeR96HL6kXE5o4w8JLKZFPagZEPO9wUMA8+sPT7/A/XqRXbNwZnNf nToskyYfkD1bd1Trl025N4nTfhHsXz3FUrp/ftHNWQgEiWXlTuXLLnk1X6VgoWNLW2wC YRUqqnBqMUDRj4pbKaHkv0HtgV75u+i4ELNByvZodLIC2QvChCjRuGCnEi1fbMLtdGff wEfgsBgTLpo1PJWwYKzMFEdxwh8kuEaaqVqQVxKGQMwIi2gt3ytVOrUN9+0sl8TU4aR3 d+Sg== X-Received: by 10.152.146.199 with SMTP id te7mr6314577lab.23.1362073128855; Thu, 28 Feb 2013 09:38:48 -0800 (PST) MIME-Version: 1.0 Received: by 10.114.60.212 with HTTP; Thu, 28 Feb 2013 09:38:27 -0800 (PST) In-Reply-To: References: From: Colin Hebert Date: Thu, 28 Feb 2013 17:38:27 +0000 Message-ID: Subject: Re: Custom filter for document permissions To: solr-user@lucene.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org I know that the query selects everything, this is why I made this request to test my solution. If a user make a query with a very large amount of results with paging, I expected the post filter to be executed only when necessary (as it can be expensive). Colin On 28 February 2013 17:25, Timothy Potter wrote: > Hi Colin, > > Your query is *:* so that is every document. Try a query that only > matches a small subset and see if you get different results. > > Cheers, > Tim > > On Thu, Feb 28, 2013 at 8:17 AM, Colin Hebert wrote: >> Thank you Timothy, >> >> With the indication you gave me (and the help of this article >> http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ ) I >> managed to draft my own filter, but it seems that it doesn't work >> quite as I expected. >> >> Here is what I've done so far: >> https://github.com/ColinHebert/Sakai-Solr/tree/permission/permission/solr/src/main/java/org/sakaiproject/search/solr/permission/filter >> >> But it seems that the filter is applied on every document matched by a >> query (rather than doing that on the range of documents I searched >> for). >> >> I've done some tests with 10k+ documents and the query >> /select?q=*%3A*&fq={!sakai%20userId=admin}&tv=false&start=0&rows=1 >> takes ages to execute (and in my application I can see that solr is >> trying to apply the filter on absolutely every document. >> >> Cheers, >> Colin >> Colin Hebert >> >> >> On 26 February 2013 15:30, Timothy Potter wrote: >>> Hi Colin, >>> >>> I think a filter is definitely the way to go. Moreover, you should >>> look into Solr's PostFilter concept which is intended to work with >>> "expensive" filters. Have a look at Yonik's blog post on this topic: >>> http://yonik.com/posts/advanced-filter-caching-in-solr/ >>> >>> Cheers, >>> Tim >>> >>> On Tue, Feb 26, 2013 at 7:24 AM, Colin Hebert wrote: >>>> Hi, >>>> >>>> I have some troubles to figure out the right thing when it comes to >>>> filtering results for security reasons. >>>> >>>> I work on this application that contains documents that are not >>>> accessible to everyone, so I want to filter the search results, based >>>> on the right to read each document for the user making the search >>>> query. >>>> To do that, right now, I have a filter on the application side that >>>> checks for each document returned by a search query, if it is >>>> accessible by the current user, and removes it from the result list if >>>> it isn't. >>>> >>>> That isn't really optimal as you might get a result page with 7 >>>> results instead of 10 because some results were removed (and if you're >>>> smart enough you can figure out the content of those hidden documents >>>> by doing many search queries). >>>> >>>> So I can think of two solutions, either I code a paging system in my >>>> application that will take care of those holes in the result list, but >>>> it adds quite a lot of work that could be useless if solr can take >>>> care of that. >>>> The second solution is having solr filtering those results before >>>> sending them back. >>>> >>>> The second solution seems a bit more clean to me, but I'm not sure if >>>> it is a good practice or not. >>>> >>>> The permission system in the application is a bit 'wild', some >>>> permissions are based on the day of the week, others on the existence >>>> or not of another document, so I can't really get out of this >>>> situation by storing more information in the index and using standard >>>> filters. >>>> If creating a custom filter in Solr isn't too bad, what I was thinking >>>> of would require the solr server making a request to the application >>>> to check if the user (given as a parameter in the query) can access >>>> the document (and that should be done on each document). >>>> Note that I will have to do that security check anyways, so the time >>>> to do a security check isn't (at least shouldn't) be relevant to the >>>> performances of a solution over the other. >>>> What will have an impact though is the fact that the solr server has >>>> to do a request to the application (network connection) for each >>>> document. >>>> >>>> Colin Hebert