Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 21491 invoked from network); 11 Apr 2008 22:04:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Apr 2008 22:04:09 -0000 Received: (qmail 16640 invoked by uid 500); 11 Apr 2008 22:04:03 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 16614 invoked by uid 500); 11 Apr 2008 22:04:03 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 16603 invoked by uid 99); 11 Apr 2008 22:04:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2008 15:04:03 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of adb@teamware.com designates 212.226.92.15 as permitted sender) Received: from [212.226.92.15] (HELO monkey.teamware.com) (212.226.92.15) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Apr 2008 22:03:21 +0000 Received: from intrepid.teamw.com (intrepid.teamw.com [10.142.128.11]) by monkey.teamware.com (8.13.1/8.13.1) with ESMTP id m3BM3I4g021104 for ; Sat, 12 Apr 2008 01:03:18 +0300 Received: from [10.142.3.10] ([10.142.3.10]) by nimitz.teamw.com with ESMTP id m4c13fn1; 12 Apr 2008 01:03:00 +0300 Message-ID: <47FFE021.7060502@teamware.com> Date: Sat, 12 Apr 2008 08:03:13 +1000 From: Antony Bowesman Organization: Teamware Group User-Agent: Thunderbird 2.0.0.12 (Windows/20080213) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Using Lucene partly as DB and 'joining' search results. References: <47FF3B35.4000705@teamware.com> <47FF5067.8010902@garambrogne.net> <200804111805.57393.paul.elschot@xs4all.nl> In-Reply-To: <200804111805.57393.paul.elschot@xs4all.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.6 (monkey.teamware.com [212.226.92.15]); Sat, 12 Apr 2008 01:03:18 +0300 (EEST) X-TWG-MailScanner-Information: See www.mailscanner.info for information X-TWG-MailScanner: Found to be clean X-TWG-MailScanner-SpamCheck: not spam, SpamAssassin (score=0.001, required 5, autolearn=not spam, BAYES_50 0.00) X-MailScanner-From: adb@teamware.com X-Virus-Checked: Checked by ClamAV on apache.org Paul Elschot wrote: > Op Friday 11 April 2008 13:49:59 schreef Mathieu Lecarme: >> Use Filter and BitSet. >> From the personnal data, you build a Filter >> (http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Fil >> ter.html) wich is used in the main index. > > With 1 billion mails, and possibly a Filter per user, you may want to > use more compact filters than BitSets, which is currently possible > in the development trunk of lucene. Thanks for the pointers. I've already used Solr's DocSet interface in my implementation, which I think is where the ideas for the current Lucene enhancements came from. They work well to reduce the filter's footprint. I'm also caching filters. The intention is that there is a user data index and the mail index(es). The search against user data index will return a set of mail Ids, which is the common key between the two. Doc Ids are no good between the indexes, so that means a potentially large boolean OR query to create the filter of labelled mails in the mail indexes. I know it's a theoretical question, but will this perform? The read only data and modifiable user data need to be kept separate because the RO data can easily be re-created, which means I can't just create the filter as part of the base search. Regards Antony --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org