Return-Path: Delivered-To: apmail-lucene-solr-user-archive@locus.apache.org Received: (qmail 64526 invoked from network); 10 Jul 2008 04:06:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Jul 2008 04:06:34 -0000 Received: (qmail 13643 invoked by uid 500); 10 Jul 2008 04:06:33 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 12910 invoked by uid 500); 10 Jul 2008 04:06:31 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 12899 invoked by uid 99); 10 Jul 2008 04:06:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Jul 2008 21:06:31 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of noble.paul@gmail.com designates 64.233.182.189 as permitted sender) Received: from [64.233.182.189] (HELO nf-out-0910.google.com) (64.233.182.189) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 10 Jul 2008 04:05:40 +0000 Received: by nf-out-0910.google.com with SMTP id g16so1018398nfd.15 for ; Wed, 09 Jul 2008 21:06:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references; bh=Y4jOWxTkNCapT75CGPLhc740H2zUFLiAQO7nQtMzl1c=; b=OtOeHly6vvroGOkeygQQ3PXU9OOYCWxRwICFIOAKuvWzpMnx+HyIiDaKbg2pn+rCmO aAFDlTq1867Il2PvFW0sMb+vbTTb5YhxOLH/+Nggd8UryNcloGkUXawpWoSWXup05zoI IkYsAfhidW0iqBH1wirRumnDSnoMHLjAl2QCk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references; b=A5HaIxgPiCPBRDyACy5PBxpymPir/Q1S/rAJXlU9e2rP+HV/Hu6egsvZBDZIgRbKsG KPpUutl6mj2Ix9xfkxNeOh0vJio8iHplvbsbvyHpVV5SJlVWW3ZipfrVzkcXO+7yjDsF gjO6wB8PNhyLz7xgq1w8qSx+QqWhwQh0GNoOA= Received: by 10.210.47.7 with SMTP id u7mr5730519ebu.14.1215662761722; Wed, 09 Jul 2008 21:06:01 -0700 (PDT) Received: by 10.210.13.7 with HTTP; Wed, 9 Jul 2008 21:06:01 -0700 (PDT) Message-ID: <5e76b0ad0807092106q66016c36je4e6cd2ce55e61b1@mail.gmail.com> Date: Thu, 10 Jul 2008 09:36:01 +0530 From: "=?UTF-8?B?Tm9ibGUgUGF1bCDgtKg=?= =?UTF-8?B?4LWL4LSs4LS/4LSz4LWN4oCNIOCkqOCli+CkrOCljeCks+CljQ==?=" To: solr-user@lucene.apache.org Subject: Re: tagging application, best way to architect? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jul 10, 2008 at 7:53 AM, aris buinevicius wrote: > We're trying to implement a large scale domain specific web email > application, and so far solr performance on the search side is really doing > well for us. > > There are two limitations that I can't seem to get around however, and was > hoping for some advice. > > 1. We would like to do bulk tagging on large query result sets (ie, if you > have 1M emails, do a search, and then you wish to apply a tag to the result > set of, say, 250k results). I've tried many approaches, but the closest > support I could see was the update field functionality in SOLR-139. Is > there any other way to separate the very dynamic metadata (tags and other > fields) abstracted away from the static documents themselves? I've > researched joining against a metadata database, but unfortunately the join > logic for large results is just too bulky to be perform well at scale. > Also have even looked at postgres tsearch2, but that also breaks down with a > large number of emails. Updating large no:of docs in one go is a bit expensive . (SOLR-139) is trying to achieve that but it is still expensive.If the users do not tag the docs too often then it may be OK > > 2. We're assuming we'll have thousands of users with independent data; any > good way to partition multiple indexes with solr? With Lucene we could > just save those in independent directories, and cache the index while the > user session is active. I saw some configurations on tomcat that would > allow multiple instances, but that's probably not practical for lots of > concurrent users. Maintaining multiple indices is not a good idea. Add an extra attribute 'userid' to each document and search with user id as a 'fq'. The caches in Solr will automatically take care of the rest. > > Thanks for any tips; would love to use Solr (or Lucene), but haven't been > able to get around issue 1 yet for large numbers of emails in a timely > response. We've really looked at the gamut here, including solr, lucene, > postgres (tsearch2), sphinx, xapian, couchdb(!), and more. > > ab > -- --Noble Paul