lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: Adding clear() to Document
Date Wed, 20 May 2009 20:10:09 GMT
I came across this while working on 1595 (changes to benchmark). I noticed
LineDocMaker reuses Document and Fields, and I wanted to pull that up to a
base DocMaker since I got the impression it yields better (even if not
significant) performance.

With the addition of the Field ctor which accepts a boolean for interning,
and with the changes to String.intern() which are to come, I agree this is
will have less impact, but is still convenient. Today, I can already call
doc.getFields(), iterate on them and call doc.remove(Field).
Document.clear() will just save me the trouble.

Besides all the above changes, reusing Document and FIeld saves object
allocations. For the documents in the benchmark package this may mean
millions of Document objects + much more Field objects. Even if it always
avoided interning, this means saving lots of allocations, which are really
not necessary.

For other applications, the number of fields may be much larger than in the
current benchmark impls, where it becomes even more important.

Passing a list of Fields will save the Field allocations (assuming the app
caches them on the outside) but still require Document allocation. Why not
save that either?

On Wed, May 20, 2009 at 11:01 PM, Yonik Seeley

> On Wed, May 20, 2009 at 3:27 PM, Shai Erera <> wrote:
> > I noticed Document does not have a clear() method, to remove all the
> Fields
> > set on it.
> Document's state is so simple (a List and a boost), reuse doesn't seem
> worth it.
> What if, instead, we allowed the List to be passed into via Document's
> constructor?
> To put it into perspective, the Document object then becomes lighter
> weight than the String object (provided the user is caching the List
> of fields).  And really, I think caching the list of fields is even
> overboard for pretty much all of the applications out there - I doubt
> it would ever be significant given how much relative work is needed to
> index a document.
> -Yonik
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message