lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: Rewriting an index without losing 'hidden' data
Date Fri, 08 Apr 2011 15:00:39 GMT
Unfortunately you just can't do this.  Might be possible if all fields
were stored but evidently they are not in your index.  For unstored
fields, the Document object will not contain the data that was passed
in when the doc was originally added.

I believe there might be a way of recreating some of the missing data
via TermFreqVector but that has always sounded dodgy and lossy to me.

The safest way is to reindex, however painful it might be.  Maybe you
could take the opportunity to upgrade lucene at the same time!


--
Ian.


On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford
<chris.bamford@talktalk.net> wrote:
> Hi,
>
> I recently discovered that I need to add a single field to every document in an existing
(very large) index.  Reindexing from scratch is not an option I want to consider right now,
so I wrote a utility to add the field by rewriting the index - but this seemed to lose some
of the fields (indexed, but not stored?).  In fact, it shrunk a 12Gb index down to 4.2Gb
- clearly not what I wanted.  :-)
> What am I doing wrong?
>
> My technique was:
>
>  Analyzer analyser = new StandardAnalyzer();
>  IndexSearcher searcher = new IndexSearcher(indexPath);
>  IndexWriter indexWriter = new IndexWriter(indexPath, analyser);
>  Hits hits = matchAllDocumentsFromIndex(searcher);
>
>  for (int i=0; i < hits.length(); i++) {
>          Document doc = hits.doc(i);
>          String id = doc.get("unique-id");
>          doc.add(new Field("newField", newValue, Field.Store.YES, Field.Index.UN_TOKENIZED));
>          indexWriter.updateDocument(new Term("unique-id", id), doc);
>  }
>
>  searcher.close();
>  indexWriter.optimize();
>  indexWriter.close();
>
> Note that my matchAllDocumentsFromIndex() does get the right number of hits from the
index - i.e. the same number as held in the index.
>
>
>  Thanks for any ideas!
> BTW I am using Lucene 2.3.2.
>
> - Chris
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message