lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: Recognising an index rebuild is required
Date Tue, 02 Oct 2012 09:19:42 GMT
Hi,

You can extract the terms from a document by iterating through all terms 
in a field looking for terms in your document. This can be done with a 
combination of IndexReader.Terms and IndexReader.TermDocs. You could 
then compare the indexed terms with the terms your current analyzer 
produces.

The following will output "brown", "dog", "fox", "jumps", "lazy", 
"over", "quick". This approach wont detect changes in other attributes, 
like position increments or payload data.

public static void Main(String[] args) {
     var directory = new RAMDirectory();
     var analyzer = new StandardAnalyzer(Version.LUCENE_30);

     // Create example document.
     var writer = new IndexWriter(directory, analyzer, true, 
IndexWriter.MaxFieldLength.UNLIMITED);
     var doc = new Document();
     doc.Add(new Field("text", "The quick brown fox jumps over the lazy 
dog", Field.Store.NO, Field.Index.ANALYZED));
     writer.AddDocument(doc);
     writer.Commit();

     var reader = IndexReader.Open(directory, readOnly: true);
     var documentId = 0; // Known in this example.
     var documentTerms = new List<String>();

     using (var termDocs = reader.TermDocs())
     using (var termEnum = reader.Terms(new Term("text"))) {
         do {
             var term = termEnum.Term;
             if (term == null) break;
             if (term.Field != "text") break;

             // Iterate through all documents with the current term.
             termDocs.Seek(termEnum);
             while (termDocs.Next()) {
                 if (termDocs.Doc == documentId)
                     documentTerms.Add(term.Text);
             }
         } while (termEnum.Next());
     }

     foreach (var term in documentTerms) {
         Console.WriteLine(term);
     }
}

// Simon

On 2012-10-02 10:58, Allan, Brad (Wokingham) wrote:
> My application has metadata that describes the type of analysis to be used for a document
field.
>
> Is there any index information that would allow me to compare the analyzer used to index
a field with the analyser being specified by my metadata in order to decide whether or not
I need to discard an index and rebuild it (because the analysis type has changed).
>
> Just fishing for ideas, right now my thought is to maintain a 'index info' file in the
same location as my indexes...
>
>
> ________________________________
> CheckFree Solutions Limited (trading as Fiserv)
> Registered Office: Eversheds House, 70 Great Bridgewater Street, Manchester, M15 ES
> Registered in England: No. 2694333
>


Mime
View raw message