Return-Path: X-Original-To: apmail-lucenenet-user-archive@www.apache.org Delivered-To: apmail-lucenenet-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 92CF6D98A for ; Tue, 2 Oct 2012 09:20:23 +0000 (UTC) Received: (qmail 22624 invoked by uid 500); 2 Oct 2012 09:20:23 -0000 Delivered-To: apmail-lucenenet-user-archive@lucenenet.apache.org Received: (qmail 22350 invoked by uid 500); 2 Oct 2012 09:20:20 -0000 Mailing-List: contact user-help@lucenenet.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@lucenenet.apache.org Delivered-To: mailing list user@lucenenet.apache.org Received: (qmail 22325 invoked by uid 99); 2 Oct 2012 09:20:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Oct 2012 09:20:19 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [195.74.38.226] (HELO vsp-authed-03-02.binero.net) (195.74.38.226) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 02 Oct 2012 09:20:09 +0000 Received: from smtp01.binero.se (unknown [195.74.38.28]) by vsp-authed-03-02.binero.net (Halon Mail Gateway) with ESMTP for ; Tue, 2 Oct 2012 11:19:42 +0200 (CEST) Received: from Computron.local (s83-177-186-228.cust.tele2.se [83.177.186.228]) (Authenticated sender: sisve@devhost.se) by smtp-10-01.atm.binero.net (Postfix) with ESMTPSA id DECCB3A145 for ; Tue, 2 Oct 2012 11:19:42 +0200 (CEST) Message-ID: <506AB1AE.7080708@devhost.se> Date: Tue, 02 Oct 2012 11:19:42 +0200 From: Simon Svensson User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:15.0) Gecko/20120907 Thunderbird/15.0.1 MIME-Version: 1.0 To: user@lucenenet.apache.org Subject: Re: Recognising an index rebuild is required References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Hi, You can extract the terms from a document by iterating through all terms in a field looking for terms in your document. This can be done with a combination of IndexReader.Terms and IndexReader.TermDocs. You could then compare the indexed terms with the terms your current analyzer produces. The following will output "brown", "dog", "fox", "jumps", "lazy", "over", "quick". This approach wont detect changes in other attributes, like position increments or payload data. public static void Main(String[] args) { var directory = new RAMDirectory(); var analyzer = new StandardAnalyzer(Version.LUCENE_30); // Create example document. var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED); var doc = new Document(); doc.Add(new Field("text", "The quick brown fox jumps over the lazy dog", Field.Store.NO, Field.Index.ANALYZED)); writer.AddDocument(doc); writer.Commit(); var reader = IndexReader.Open(directory, readOnly: true); var documentId = 0; // Known in this example. var documentTerms = new List(); using (var termDocs = reader.TermDocs()) using (var termEnum = reader.Terms(new Term("text"))) { do { var term = termEnum.Term; if (term == null) break; if (term.Field != "text") break; // Iterate through all documents with the current term. termDocs.Seek(termEnum); while (termDocs.Next()) { if (termDocs.Doc == documentId) documentTerms.Add(term.Text); } } while (termEnum.Next()); } foreach (var term in documentTerms) { Console.WriteLine(term); } } // Simon On 2012-10-02 10:58, Allan, Brad (Wokingham) wrote: > My application has metadata that describes the type of analysis to be used for a document field. > > Is there any index information that would allow me to compare the analyzer used to index a field with the analyser being specified by my metadata in order to decide whether or not I need to discard an index and rebuild it (because the analysis type has changed). > > Just fishing for ideas, right now my thought is to maintain a 'index info' file in the same location as my indexes... > > > ________________________________ > CheckFree Solutions Limited (trading as Fiserv) > Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES > Registered in England: No. 2694333 >