lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: Rewriting an index without losing 'hidden' data
Date Fri, 08 Apr 2011 16:08:55 GMT
Unfortunately, updateDocument replaces the *entire* previous document
with the new one.

The ability to update a single indexed field (either replace that
field entirely, or, change only certain token occurrences within it),
while leaving all other indexed fields in the document unaffected, has
been a long requested big missing feature in Lucene.  We call it
"incremental field updates".

There have been some healthy discussions on the dev list, that have
worked out a good rough design (eg see  Also, recent
improvements in how buffered deletes are handled should make it alot
easier for updates to "piggyback" using that same packet stream
approach.  So... I think there is hope some day that we'll get this
into Lucene.


On Fri, Apr 8, 2011 at 11:00 AM, Ian Lea <> wrote:
> Unfortunately you just can't do this.  Might be possible if all fields
> were stored but evidently they are not in your index.  For unstored
> fields, the Document object will not contain the data that was passed
> in when the doc was originally added.
> I believe there might be a way of recreating some of the missing data
> via TermFreqVector but that has always sounded dodgy and lossy to me.
> The safest way is to reindex, however painful it might be.  Maybe you
> could take the opportunity to upgrade lucene at the same time!
> --
> Ian.
> On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford
> <> wrote:
>> Hi,
>> I recently discovered that I need to add a single field to every document in an existing
(very large) index.  Reindexing from scratch is not an option I want to consider right now,
so I wrote a utility to add the field by rewriting the index - but this seemed to lose some
of the fields (indexed, but not stored?).  In fact, it shrunk a 12Gb index down to 4.2Gb
- clearly not what I wanted.  :-)
>> What am I doing wrong?
>> My technique was:
>>  Analyzer analyser = new StandardAnalyzer();
>>  IndexSearcher searcher = new IndexSearcher(indexPath);
>>  IndexWriter indexWriter = new IndexWriter(indexPath, analyser);
>>  Hits hits = matchAllDocumentsFromIndex(searcher);
>>  for (int i=0; i < hits.length(); i++) {
>>          Document doc = hits.doc(i);
>>          String id = doc.get("unique-id");
>>          doc.add(new Field("newField", newValue, Field.Store.YES, Field.Index.UN_TOKENIZED));
>>          indexWriter.updateDocument(new Term("unique-id", id), doc);
>>  }
>>  searcher.close();
>>  indexWriter.optimize();
>>  indexWriter.close();
>> Note that my matchAllDocumentsFromIndex() does get the right number of hits from
the index - i.e. the same number as held in the index.
>>  Thanks for any ideas!
>> BTW I am using Lucene 2.3.2.
>> - Chris
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message