Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 33618 invoked from network); 15 Oct 2010 19:00:01 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 15 Oct 2010 19:00:01 -0000 Received: (qmail 97390 invoked by uid 500); 15 Oct 2010 18:59:58 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 97272 invoked by uid 500); 15 Oct 2010 18:59:58 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 97264 invoked by uid 99); 15 Oct 2010 18:59:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Oct 2010 18:59:58 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [141.211.3.202] (HELO itcs-ehub-02.adsroot.itcs.umich.edu) (141.211.3.202) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Oct 2010 18:59:51 +0000 Received: from ITCS-ECLS-1-VS3.adsroot.itcs.umich.edu ([141.211.3.232]) by itcs-ehub-02.adsroot.itcs.umich.edu ([141.211.3.202]) with mapi; Fri, 15 Oct 2010 14:59:29 -0400 From: "Burton-West, Tom" To: "solr-user@lucene.apache.org" Date: Fri, 15 Oct 2010 14:59:26 -0400 Subject: RE: filter query from external list of Solr unique IDs Thread-Topic: filter query from external list of Solr unique IDs Thread-Index: ActsgIJT38eK7Ow9QWytD9ehPi2ayAACuebtAANEV4A= Message-ID: References: <90FF863A96E1EC42B8B240D04C88FB1D133E5E86D8@JHEMTEXVS2.win.ad.jhu.edu> In-Reply-To: <90FF863A96E1EC42B8B240D04C88FB1D133E5E86D8@JHEMTEXVS2.win.ad.jhu.edu> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Hi Jonathan, The advantages of the obvious approach you outline are that it is simple, i= t fits in to the existing Solr model, it doesn't require any customization = or modification to Solr/Lucene java code. Unfortunately, it does not scale= well. We originally tried just what you suggest for our implementation of= Collection Builder. For a user's personal collection we had a table that = maps the collection id to the unique Solr ids. Then when they wanted to search their collection, we just took their search= and added a filter query with the fq=3D(id:1 OR id:2 OR....). I seem to = remember running in to a limit on the number of OR clauses allowed. Even if= you can set that limit larger, there are a number of efficiency issues. = =20 We ended up constructing a separate Solr index where we have a multi-valued= collection number field. Unfortunately, until incremental field updating g= ets implemented, this means that every time someone adds a document to a co= llection, the entire document (including 700KB of OCR) needs to be re-index= ed just to update the collection number field. This approach has allowed us= to scale up to a total of something under 100,000 documents, but we don't = think we can scale it much beyond that for various reasons. I was actually thinking of some kind of custom Lucene/Solr component that w= ould for example take a query parameter such as &lookitUp=3D123 and the com= ponent might do a JDBC query against a database or kv store and return resu= lts in some form that would be efficient for Solr/Lucene to process. (Of co= urse this assumes that a JDBC query would be more efficient than just sendi= ng a long list of ids to Solr). The other part of the equation is mapping = the unique Solr ids to internal Lucene ids in order to implement a filter q= uery. I was wondering if something like the unique id to Lucene id mapper= in zoie might be useful or if that is too specific to zoie. SoThis may be = totally off-base, since I haven't looked at the zoie code at all yet. In our particular use case, we might be able to build some kind of in-memor= y map after we optimize an index and before we mount it in production. In o= ur workflow, we update the index and optimize it before we release it and o= nce it is released to production there is no indexing/merging taking place = on the production index (so the internal Lucene ids don't change.) =20 Tom -----Original Message----- From: Jonathan Rochkind [mailto:rochkind@jhu.edu]=20 Sent: Friday, October 15, 2010 1:07 PM To: solr-user@lucene.apache.org Subject: RE: filter query from external list of Solr unique IDs Definitely interested in this.=20 The naive obvious approach would be just putting all the ID's in the query.= Like fq=3D(id:1 OR id:2 OR....). Or making it another clause in the 'q'. = =20 Can you outline what's wrong with this approach, to make it more clear what= 's needed in a solution? ________________________________________