lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay ...@AI.SRI.COM>
Subject Re: Reuse single document and fields
Date Fri, 01 Feb 2008 18:51:40 GMT
Thanks, Michael, for your quick reply and explanation.
One related question: is it true that Lucene indexer will reject a field 
  that has the empty string value? (I saw an IllegalArgumentException).
Will be nice if lucene just skip such a field silently, esp, for the new 
2.3 api.

Michael McCandless wrote:
> yu wrote:
>> Hi,
>> I am trying to use the latest 2.3 API on  Field to improve the 
>> indexing performance by reusing Documents and  Fields. After reading 
>> lucene-java wiki and the java doc on Field, I have a couple of 
>> questions about the comment in Field.setValue(), namely,
>> "Note that you should only use this method after the Field has been 
>> consumed (ie, the Document  containing this Field has been added to 
>> the index)":
>> 1. Does "consumed" here  include IndexWriter.deleteDocument(...) and 
>> IndexWriter.updateDocument(...)?
> Yes.  Actually, just add/updateDocument.
>> 2.  Why does it require that the  Field has been consumed first before 
>> it can be modified? For example, I may pre-allocate in a constructor  
>> a Document and related fields with some default values but do not (or 
>> cannot) add the document. Before I add my first document, I need to 
>> set the fields with valid values.
>> It looks like the new Lucene core would null some of the fields if I 
>> do so.  I do not understand the logic behind the requirement.
> That use case is fine.  You can absolutely change its value before it's 
> consumed when the un-consumed value doesn't matter to you.  That 
> pre-allocation is a normal & fine use case, and fields should not be 
> null'd by Lucene.
> The warning should instead state that "if the field holds a real value, 
> then make sure it's consumed first before you change it".
> So, for example, if you add docs to a queue, and then use a thread pool 
> to pull docs from that queue and call addDocument on them, you can't 
> re-use the fields in the Document until a thread has added it to the index.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message