Return-Path: X-Original-To: apmail-lucene-solr-user-archive@minotaur.apache.org Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B91E10F63 for ; Thu, 10 Oct 2013 11:56:26 +0000 (UTC) Received: (qmail 83014 invoked by uid 500); 10 Oct 2013 11:56:21 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 82964 invoked by uid 500); 10 Oct 2013 11:56:20 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 82942 invoked by uid 99); 10 Oct 2013 11:56:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 11:56:17 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vb0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Oct 2013 11:56:10 +0000 Received: by mail-vb0-f47.google.com with SMTP id h10so1513640vbh.34 for ; Thu, 10 Oct 2013 04:55:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fBzoS7l149ku9bUiMsK7PRumjnnueEVDht+tuwuGvY0=; b=RIeM1/37Ycm+flPnzeAqLw/YjMWDq1/5yLOO69WRrJk9junRigDAFr/wFMG03ul5jj 6QKf6IvBMJSKpKX9fKVtoZjvtwKdWJG7LQgpAKBivFrrC8MIU2asan+vVCQDuDwTluDr 21iHXjxrOR3BYK7CRWNHxW+36c47ApUhztW8JN1LyAZCbqp+otokBTD9UBp1aU+7aurK OWCDqJGnIUniMGQpZyMcpq34pgo0jZnfe5yP+We8K3x7jMbjm2uI7rqNyDSvPbpY1L2u TSa0R7JYvMZDrWPqt8OJ9TPFhYiPwenZOKHPdgAXtfla0FwcQEa6aKuF6JOEPgQV/c4J IQRw== MIME-Version: 1.0 X-Received: by 10.52.26.146 with SMTP id l18mr5076539vdg.95.1381406149693; Thu, 10 Oct 2013 04:55:49 -0700 (PDT) Received: by 10.52.186.228 with HTTP; Thu, 10 Oct 2013 04:55:49 -0700 (PDT) In-Reply-To: References: Date: Thu, 10 Oct 2013 07:55:49 -0400 Message-ID: Subject: Re: Solr's Filtering approaches From: Erick Erickson To: solr-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Well, my first question is why 50K groups is necessary, and whether you can simplify that. How a user can manually choose from among that many groups is "interesting". But assuming they're all necessary, I can think of two things. If the user can only select ranges, just put in filter queries using ranges. Or possibly both ranges and individual entries, as fq=group:[1A TO 10000A] OR group:(2B 45C 98Z) etc. You need to be a little careful how you put index these so range queries work properly, in the above you'd miss 2A because it's sorting lexicographically, you'd need to store in some form that sorts like 0000001A 010000A and so on. You wouldn't need to show that form to the user, just form your fq's in the app to work with that form. If that won't work (you wouldn't want this to get huge), think about a "post filter" that would only operate on documents that had made it through the select, although how to convey which groups the user selected to the post filter is an open question. Best, Erick On Wed, Oct 9, 2013 at 12:23 PM, David Philip wrote: > Hi All, > > I have an issue in handling filters for one of our requirements and > liked to get suggestion for the best approaches. > > > *Use Case:* > > 1. We have List of groups and the number of groups can increase upto >1 > million. Currently we have almost 90 thousand groups in the solr search > system. > > 2. Just before the user hits a search, He has options to select the no. of > groups he want to retrieve. [the distinct list of these group Names for > display are retrieved from other solr index that has more information about > groups] > > *3.User Operation:** * > Say if user selected group 1A - group 10000A. and searches for key:cancer. > > > The current approach I was thinking is : get search results and filter > query by groupids' list selected by user. But my concern is When these > groups list is increasing to >50k unique Ids, This can cause lot of delay > in getting search results. So wanted to know whether there are different > filtering ways that I can try for? > > I was thinking of one more approach as suggested by my colleague to do - > intersection. - > Get the groupIds' selected by user. > Get the list of groupId's from search results, > Perform intersection of both and then get the entire result set of only > those groupid that intersected. Is this better way? Can I use any cache > technique in this case? > > > - David.