lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: Adding clear() to Document
Date Wed, 20 May 2009 20:25:02 GMT
Compared to caching and passing in a List to the Document constructor,
I imagine a clear() based solution would be slower... there's more
work to do.  clear() needs to null the pointers, and then one needs to
add the fields again, one-by-one.  But I doubt we'd be able to detect
a variance anyway, given that document construction time (as opposed
to Field construction) is insignificant compared to indexing.


On Wed, May 20, 2009 at 4:10 PM, Shai Erera <> wrote:
> I came across this while working on 1595 (changes to benchmark). I noticed
> LineDocMaker reuses Document and Fields, and I wanted to pull that up to a
> base DocMaker since I got the impression it yields better (even if not
> significant) performance.
> With the addition of the Field ctor which accepts a boolean for interning,
> and with the changes to String.intern() which are to come, I agree this is
> will have less impact, but is still convenient. Today, I can already call
> doc.getFields(), iterate on them and call doc.remove(Field).
> Document.clear() will just save me the trouble.
> Besides all the above changes, reusing Document and FIeld saves object
> allocations. For the documents in the benchmark package this may mean
> millions of Document objects + much more Field objects. Even if it always
> avoided interning, this means saving lots of allocations, which are really
> not necessary.
> For other applications, the number of fields may be much larger than in the
> current benchmark impls, where it becomes even more important.
> Passing a list of Fields will save the Field allocations (assuming the app
> caches them on the outside) but still require Document allocation. Why not
> save that either?
> On Wed, May 20, 2009 at 11:01 PM, Yonik Seeley <>
> wrote:
>> On Wed, May 20, 2009 at 3:27 PM, Shai Erera <> wrote:
>> > I noticed Document does not have a clear() method, to remove all the
>> > Fields
>> > set on it.
>> Document's state is so simple (a List and a boost), reuse doesn't seem
>> worth it.
>> What if, instead, we allowed the List to be passed into via Document's
>> constructor?
>> To put it into perspective, the Document object then becomes lighter
>> weight than the String object (provided the user is caching the List
>> of fields).  And really, I think caching the list of fields is even
>> overboard for pretty much all of the applications out there - I doubt
>> it would ever be significant given how much relative work is needed to
>> index a document.
>> -Yonik

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message