lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Burton-West, Tom" <tburt...@umich.edu>
Subject RE: filter query from external list of Solr unique IDs
Date Fri, 15 Oct 2010 18:59:26 GMT
Hi Jonathan,

The advantages of the obvious approach you outline are that it is simple, it fits in to the
existing Solr model, it doesn't require any customization or modification to Solr/Lucene java
code.  Unfortunately, it does not scale well.  We originally tried just what you suggest for
our implementation of Collection Builder.  For a user's personal collection we had a table
that maps the collection id to the unique Solr ids.
Then when they wanted to search their collection, we just took their search and added a filter
query with the fq=(id:1 OR id:2 OR....).   I seem to remember running in to a limit on the
number of OR clauses allowed. Even if you can set that limit larger, there are a  number of
efficiency issues.  

We ended up constructing a separate Solr index where we have a multi-valued collection number
field. Unfortunately, until incremental field updating gets implemented, this means that every
time someone adds a document to a collection, the entire document (including 700KB of OCR)
needs to be re-indexed just to update the collection number field. This approach has allowed
us to scale up to a total of something under 100,000 documents, but we don't think we can
scale it much beyond that for various reasons.

I was actually thinking of some kind of custom Lucene/Solr component that would for example
take a query parameter such as &lookitUp=123 and the component might do a JDBC query against
a database or kv store and return results in some form that would be efficient for Solr/Lucene
to process. (Of course this assumes that a JDBC query would be more efficient than just sending
a long list of ids to Solr).  The other part of the equation is mapping the unique Solr ids
to internal Lucene ids in order to implement a filter query.   I was wondering if something
like the unique id to Lucene id mapper in zoie might be useful or if that is too specific
to zoie. SoThis may be totally off-base, since I haven't looked at the zoie code at all yet.

In our particular use case, we might be able to build some kind of in-memory map after we
optimize an index and before we mount it in production. In our workflow, we update the index
and optimize it before we release it and once it is released to production there is no indexing/merging
taking place on the production index (so the internal Lucene ids don't change.)  

Tom



-----Original Message-----
From: Jonathan Rochkind [mailto:rochkind@jhu.edu] 
Sent: Friday, October 15, 2010 1:07 PM
To: solr-user@lucene.apache.org
Subject: RE: filter query from external list of Solr unique IDs

Definitely interested in this. 

The naive obvious approach would be just putting all the ID's in the query. Like fq=(id:1
OR id:2 OR....).  Or making it another clause in the 'q'.  

Can you outline what's wrong with this approach, to make it more clear what's needed in a
solution?
________________________________________

Mime
View raw message