Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7208410698 for ; Sat, 12 Oct 2013 05:57:51 +0000 (UTC) Received: (qmail 4627 invoked by uid 500); 12 Oct 2013 05:57:43 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 4577 invoked by uid 500); 12 Oct 2013 05:57:38 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 4561 invoked by uid 99); 12 Oct 2013 05:57:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Oct 2013 05:57:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of davidphilipsheron@gmail.com designates 209.85.215.52 as permitted sender) Received: from [209.85.215.52] (HELO mail-la0-f52.google.com) (209.85.215.52) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 12 Oct 2013 05:57:31 +0000 Received: by mail-la0-f52.google.com with SMTP id ev20so4100833lab.39 for ; Fri, 11 Oct 2013 22:57:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=LXOLLP5xKvVPLKuiTV1jmKB/FcZDNaFzhpvzZV/P4aY=; b=LvySj1XPUZKylzm/7NA64qjw+bhD7C/RMbZU3iwN8EJ9f5/ELw7LqsSXa7097LZLkL X5CGEcOHfSzAeEU/h7mkyqdLoWjkQFY9ffRp1A9cK/LbMVLhIMvtWXpS7RviiBWl1Lqz j8A93EMBzAQDKS9DhAaUv0TYujOVEuJvYECDxvxBeaU3MwTaDTOWLdImY3PZdV3+r3xu ICAV/fLaGwMR0BN2Akc7KXtcZvaPxlPrpRMNE0PgktkGoMsP7j8kGThLluo43ZfOe1HW dkw/yKRMtjKMqX9b05lJNtB8ybhIb7h4hbpzPb1fqvUTBEWFqxKZQqq7KuyX1VnWANT9 UhTg== MIME-Version: 1.0 X-Received: by 10.112.42.68 with SMTP id m4mr20079570lbl.4.1381557429987; Fri, 11 Oct 2013 22:57:09 -0700 (PDT) Received: by 10.114.109.70 with HTTP; Fri, 11 Oct 2013 22:57:09 -0700 (PDT) In-Reply-To: References: Date: Sat, 12 Oct 2013 11:27:09 +0530 Message-ID: Subject: Re: Solr's Filtering approaches From: David Philip To: solr-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001a11336876d9ee6604e884e7ab X-Virus-Checked: Checked by ClamAV on apache.org --001a11336876d9ee6604e884e7ab Content-Type: text/plain; charset=ISO-8859-1 Groups are pharmaceutical research expts.. User is presented with graph view, he can select some region and all the groups in that region gets included..user can modify the groups also here.. so we didn't maintain group information in same solr index but we have externalized. I looked at post filter article. So my understanding is that, I simply have to extended as you did and should include implementaton for "isAllowed(acls[doc], groups)" .This will filter the documents in the collector and finally this collector will be returned. am I right? @Override public void collect(int doc) throws IOException { if (isAllowed(acls[doc], user, groups)) super.collect(doc); } Erick, I am interested to know whether I can extend any class that can return me only the bitset of the documents that match the search query. I can then do bitset1.andbitset2OfGroups - finally, collect only those documents to return to user. How do I try this approach? Any pointers for bit set? Thanks - David On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson wrote: > Well, my first question is why 50K groups is necessary, and > whether you can simplify that. How a user can manually > choose from among that many groups is "interesting". But > assuming they're all necessary, I can think of two things. > > If the user can only select ranges, just put in filter queries > using ranges. Or possibly both ranges and individual entries, > as fq=group:[1A TO 10000A] OR group:(2B 45C 98Z) etc. > You need to be a little careful how you put index these so > range queries work properly, in the above you'd miss > 2A because it's sorting lexicographically, you'd need to > store in some form that sorts like 0000001A 010000A > and so on. You wouldn't need to show that form to the > user, just form your fq's in the app to work with > that form. > > If that won't work (you wouldn't want this to get huge), think > about a "post filter" that would only operate on documents that > had made it through the select, although how to convey which > groups the user selected to the post filter is an open > question. > > Best, > Erick > > On Wed, Oct 9, 2013 at 12:23 PM, David Philip > wrote: > > Hi All, > > > > I have an issue in handling filters for one of our requirements and > > liked to get suggestion for the best approaches. > > > > > > *Use Case:* > > > > 1. We have List of groups and the number of groups can increase upto >1 > > million. Currently we have almost 90 thousand groups in the solr search > > system. > > > > 2. Just before the user hits a search, He has options to select the no. > of > > groups he want to retrieve. [the distinct list of these group Names for > > display are retrieved from other solr index that has more information > about > > groups] > > > > *3.User Operation:** * > > Say if user selected group 1A - group 10000A. and searches for > key:cancer. > > > > > > The current approach I was thinking is : get search results and filter > > query by groupids' list selected by user. But my concern is When these > > groups list is increasing to >50k unique Ids, This can cause lot of delay > > in getting search results. So wanted to know whether there are different > > filtering ways that I can try for? > > > > I was thinking of one more approach as suggested by my colleague to do - > > intersection. - > > Get the groupIds' selected by user. > > Get the list of groupId's from search results, > > Perform intersection of both and then get the entire result set of only > > those groupid that intersected. Is this better way? Can I use any cache > > technique in this case? > > > > > > - David. > --001a11336876d9ee6604e884e7ab--