Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 66332 invoked from network); 14 Sep 2008 08:02:42 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 14 Sep 2008 08:02:42 -0000 Received: (qmail 15148 invoked by uid 500); 14 Sep 2008 08:02:31 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 15109 invoked by uid 500); 14 Sep 2008 08:02:31 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 15098 invoked by uid 99); 14 Sep 2008 08:02:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Sep 2008 01:02:31 -0700 X-ASF-Spam-Status: No, hits=3.3 required=10.0 tests=DNS_FROM_RFC_BOGUSMX,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [200.53.68.48] (HELO occemailcluster.terra.com.mx) (200.53.68.48) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 14 Sep 2008 08:01:32 +0000 Received: from [192.168.1.100] (201.151.221.218) by occemailcluster.terra.com.mx (7.2.033.1) (authenticated as gsegura@correobancomer.com) id 48458E7100749D35 for java-user@lucene.apache.org; Sun, 14 Sep 2008 02:56:56 -0500 Message-ID: <48CCC608.8070808@correobancomer.com> Date: Sun, 14 Sep 2008 03:06:32 -0500 From: Gerardo Segura User-Agent: Thunderbird 2.0.0.16 (Windows/20080708) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Frequently updated fields References: <1221781428.20080912135749@gmail.com> <6E7487CD-2836-4B3C-9940-67F989F018E8@gmail.com> <1999039569.20080912145111@gmail.com> In-Reply-To: <1999039569.20080912145111@gmail.com> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org I had similar requirements: some fields didn't required text processing, there were just used as filters to focus the search on subset of documents in solr. As Karl suggested, implementing a filter was the most direct approach for me. The issue was that, not been familiar myself with solr, I couldn't manage to integrate my filter without modifying SolrIndexSearcher, the change was basically to replace every invocation of searcher.search(query, new HitCollector() { ... }) ; with searcher.search(query, myCustomFilter, new HitCollector() { ... }) ; myCustomFilter is an instance of TermsFilter with document's keys added based on a query from external database. Also minor changes were made in SolrCore.java to be able to declare the filter in sorlconfig.xml. The thing worked ok, but I always wondered if that was the best way to integrate the filter. regards, Wojciech Strzaļæ½ka wrote: > The most changing fields will be I think: > Status (read/unread): in fact I'm affraid of this at most - any > mail incoming to the system will need to be indexed at least twice > Flags: 0..n values from enum > Tags: 0..n values from enum > > Of course all the other fields can also change - even content in draft messages > (it's live content, not archival) - but in such a case I'm ready to go > with the re-indexing. > >> Hi Wojciech, >> >> can you please give us a bit more specific information about the meta >> data fields that will change? I would recommend you looking at >> creating filters from your primary persistency for query clauses such >> as unread/read, mailbox folders, et c. >> >> karl >> >> 12 sep 2008 kl. 13.57 skrev Wojciech Strza?ka: >> >>> Hi. >>> >>> I'm new to Lucene and I would like to get a few answers (they can >>> be lame) >>> >>> I want to index large amount of emails using Lucene (maybe SOLR), >>> not only >>> the contents but also some metadata like state or flags. The >>> problem is that the metadata will change during mail lifecycle, >>> although much smaller updating this information will require >>> reindex the whole mail content which I see performance bottleneck. >>> >>> I have the data in DB also so my first question is: >>> >>> - are there any best practices to implement my needs (querying both >>> lucene & DB and then merging in memory?, close one eye and re-index >>> the whole content on every metadata change? others?) >>> >>> - is at all Lucene good solution for my problem? >>> >>> - are there any plans to implement field updates in more efficient >>> way then >>> delete/insert the whole document? if yes what's the time horizon? >>> >>> >>> Best regards >>> Wojtek >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> > > > >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org