lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul നോബിള്‍ नोब्ळ्" <>
Subject Re: tagging application, best way to architect?
Date Thu, 10 Jul 2008 04:06:01 GMT
On Thu, Jul 10, 2008 at 7:53 AM, aris buinevicius <> wrote:
> We're trying to implement a large scale domain specific web email
> application, and so far solr performance on the search side is really doing
> well for us.
> There are two limitations that I can't seem to get around however, and was
> hoping for some advice.
> 1. We would like to do bulk tagging on large query result sets (ie, if you
> have 1M emails, do a search, and then you wish to apply a tag to the result
> set of, say, 250k results).   I've tried many approaches, but the closest
> support I could see was the update field functionality in SOLR-139.   Is
> there any other way to separate the very dynamic metadata (tags and other
> fields) abstracted away from the static documents themselves?   I've
> researched joining against a metadata database, but unfortunately the join
> logic for large results is just too bulky to be perform well at scale.
> Also have even looked at postgres tsearch2, but that also breaks down with a
> large number of emails.
Updating large no:of docs in one go is a bit expensive . (SOLR-139) is
trying to achieve that but it is still expensive.If the users do not
tag the docs too often then it may be OK
> 2. We're assuming we'll have thousands of users with independent data; any
> good way to partition multiple indexes with solr?   With Lucene we could
> just save those in independent directories, and cache the index while the
> user session is active.   I saw some configurations on tomcat that would
> allow multiple instances, but that's probably not practical for lots of
> concurrent users.
Maintaining multiple indices is not a good idea. Add an extra
attribute 'userid' to each document and search with user id as a 'fq'.
The caches in Solr will automatically take care of the rest.
> Thanks for any tips; would love to use Solr (or Lucene), but haven't been
> able to get around issue 1 yet for large numbers of emails in a timely
> response.   We've really looked at the gamut here, including solr, lucene,
> postgres (tsearch2), sphinx, xapian, couchdb(!), and more.
> ab

--Noble Paul

View raw message