On Mon, Mar 8, 2010 at 1:18 PM, Christopher Laux <ctlaux@googlemail.com> wrote:
> I'm not sure if this is the right list, as it's sort of a development
> question too, but I don't want to bother them over there. Anyway, I'm
> curious as to the reason for using "manual memory management" a la
> ByteBlockPool and consorts in Java. Is it for performance reasons
> alone, to avoid the allocation and garbage collection of many small
> objects or is there some residue of C-style thinking in the early
> years?
This was done for performance (to remove alloc/init/GC load).
There are two parts to it -- first, consolidating what used to be lots
of little objects into shared byte[]/int[] blocks. Second, reusing
those blocks.
I think the biggest perf gains were from the first (consolidating tiny
objs together), but we probably still have some gains from the second.
A simple test would be to change the pools to not re-use and then
measure indexing throughput.
> Even then, shouldn't there be a more Java-ish solution using the
> existing streams classes? Would that be the way to go if one started
> over? I realize this is not very realistic, I'm asking out of
> curiosity.
Actually that's how Lucene used to work, and then (in 2.3 I think) we
cutover to the current reused blocks ram writing. If we were to start
over I don't think I'd change much over where we are now, at least on
this aspect of Lucene. There are plenty of other things I'd change ;)
But... one can always make a custom indexing chain (it's a package
private API now, but possible) to do something totally different. EG
I think a chain dedicated to inverting tiny docs could show sizable
gains over the default chain Lucene uses> today.
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|