From Simon Svensson <>
Subject Re: Recognising an index rebuild is required
Date Tue, 02 Oct 2012 09:19:42 GMT

You can extract the terms from a document by iterating through all terms 
in a field looking for terms in your document. This can be done with a 
combination of IndexReader.Terms and IndexReader.TermDocs. You could 
then compare the indexed terms with the terms your current analyzer 

The following will output "brown", "dog", "fox", "jumps", "lazy", 
"over", "quick". This approach wont detect changes in other attributes, 
like position increments or payload data.

public static void Main(String[] args) {
     var directory = new RAMDirectory();
     var analyzer = new StandardAnalyzer(Version.LUCENE_30);

     // Create example document.
     var writer = new IndexWriter(directory, analyzer, true, 
     var doc = new Document();
     doc.Add(new Field("text", "The quick brown fox jumps over the lazy 
dog", Field.Store.NO, Field.Index.ANALYZED));

     var reader = IndexReader.Open(directory, readOnly: true);
     var documentId = 0; // Known in this example.
     var documentTerms = new List<String>();

     using (var termDocs = reader.TermDocs())
     using (var termEnum = reader.Terms(new Term("text"))) {
         do {
             var term = termEnum.Term;
             if (term == null) break;
             if (term.Field != "text") break;

             // Iterate through all documents with the current term.
             while (termDocs.Next()) {
                 if (termDocs.Doc == documentId)
         } while (termEnum.Next());

     foreach (var term in documentTerms) {

On 2012-10-02 10:58, Allan, Brad (Wokingham) wrote:
> My application has metadata that describes the type of analysis to be used for a document
> Is there any index information that would allow me to compare the analyzer used to index
a field with the analyser being specified by my metadata in order to decide whether or not
I need to discard an index and rebuild it (because the analysis type has changed).
> Just fishing for ideas, right now my thought is to maintain a 'index info' file in the
same location as my indexes...
