lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <chris.bamf...@talktalk.net>
Subject Rewriting an index without losing 'hidden' data
Date Fri, 08 Apr 2011 14:44:09 GMT
Hi, 

I recently discovered that I need to add a single field to every document in an existing (very
large) index.  Reindexing from scratch is not an option I want to consider right now, so I
wrote a utility to add the field by rewriting the index - but this seemed to lose some of
the fields (indexed, but not stored?).  In fact, it shrunk a 12Gb index down to 4.2Gb - clearly
not what I wanted.  :-)
What am I doing wrong?

My technique was:

  Analyzer analyser = new StandardAnalyzer();
  IndexSearcher searcher = new IndexSearcher(indexPath);
  IndexWriter indexWriter = new IndexWriter(indexPath, analyser);
  Hits hits = matchAllDocumentsFromIndex(searcher);

  for (int i=0; i < hits.length(); i++) {
          Document doc = hits.doc(i);
          String id = doc.get("unique-id");
          doc.add(new Field("newField", newValue, Field.Store.YES, Field.Index.UN_TOKENIZED));
          indexWriter.updateDocument(new Term("unique-id", id), doc);
  }

  searcher.close();
  indexWriter.optimize(); 
  indexWriter.close();

Note that my matchAllDocumentsFromIndex() does get the right number of hits from the index
- i.e. the same number as held in the index.


 Thanks for any ideas!
BTW I am using Lucene 2.3.2.

- Chris

 



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message