lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Bamford <chris.bamf...@talktalk.net>
Subject Re: Rewriting an index without losing 'hidden' data
Date Fri, 08 Apr 2011 15:46:10 GMT

 Thanks Ian,  its as I feared!
I shall have to reindex (and upgrade!).

Cheers,

- Chris

 


 

 

-----Original Message-----
From: Ian Lea <ian.lea@gmail.com>
To: java-user@lucene.apache.org
Sent: Fri, 8 Apr 2011 16:00
Subject: Re: Rewriting an index without losing 'hidden' data


Unfortunately you just can't do this.  Might be possible if all fields
were stored but evidently they are not in your index.  For unstored
fields, the Document object will not contain the data that was passed
in when the doc was originally added.

I believe there might be a way of recreating some of the missing data
via TermFreqVector but that has always sounded dodgy and lossy to me.

The safest way is to reindex, however painful it might be.  Maybe you
could take the opportunity to upgrade lucene at the same time!


--
Ian.


On Fri, Apr 8, 2011 at 3:44 PM, Chris Bamford
<chris.bamford@talktalk.net> wrote:
> Hi,
>
> I recently discovered that I need to add a single field to every document in 
an existing (very large) index.  Reindexing from scratch is not an option I want 
to consider right now, so I wrote a utility to add the field by rewriting the 
index - but this seemed to lose some of the fields (indexed, but not stored?). 
 In fact, it shrunk a 12Gb index down to 4.2Gb - clearly not what I wanted.  :-)
> What am I doing wrong?
>
> My technique was:
>
>  Analyzer analyser = new StandardAnalyzer();
>  IndexSearcher searcher = new IndexSearcher(indexPath);
>  IndexWriter indexWriter = new IndexWriter(indexPath, analyser);
>  Hits hits = matchAllDocumentsFromIndex(searcher);
>
>  for (int i=0; i < hits.length(); i++) {
>          Document doc = hits.doc(i);
>          String id = doc.get("unique-id");
>          doc.add(new Field("newField", newValue, Field.Store.YES, 
Field.Index.UN_TOKENIZED));
>          indexWriter.updateDocument(new Term("unique-id", id), doc);
>  }
>
>  searcher.close();
>  indexWriter.optimize();
>  indexWriter.close();
>
> Note that my matchAllDocumentsFromIndex() does get the right number of hits 
from the index - i.e. the same number as held in the index.
>
>
>  Thanks for any ideas!
> BTW I am using Lucene 2.3.2.
>
> - Chris
>
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message