lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: updating documents in the index
Date Thu, 04 Nov 2004 18:55:03 GMT
Chris Fraschetti wrote:

> So I've read that the only way to change a field in an already indexed
> document is to simple remove it and readd it... but that can be costly
> if I need to go back to where the data origionally came from and
> reparse and reindex it all.

Yes.

> 
> Is there a way to keep the document around after the delete call to
> the indexreader so that I can modify a field and add it again with a
> writer?

Lucene does not provide this functionality (yet?). If you try to read 
from index a document that contains "unstored" fields, you will get 
nulls instead of their values. In other words, you cannot read the 
Document instance, modify it, and then add it - because you will lose 
all information from "unstored" fields. Also, when you re-add the 
document all fields need to be analyzed once again.

> 
> I would simple rip out all the fields and then create a new document,
> but the 'content' field isn't stored due to the fact that my index
> would be much larger if i kept the content around.
> 
> Anyone have any good solutions to do this short of keeping around the
> content in the index or going back to the origional document source?
> 
> Does 'luke' rebuild a document so that it can be updated? If so, how
> do they go about it.

"They" (me and Luke :) do it the hard way - we iterate over all terms in 
the index, and then iterate over all documents which contain that term. 
If the enumeration contains the selected doc number, terms and their 
positions are put in the target term array. After going through the 
whole index, we end up with an array containing all terms and every 
position of each term in the document. This array is then concatenated 
using spaces. That's it - not really a solution, rather a hack.

This could be sped up using term vectors (Lucene 1.4.x), but you first 
need to build your index with term vectors.

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message