lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/
Date Fri, 01 Jul 2011 16:07:14 GMT
About the encoders package - there are several encoders there besides
VInt, so I wouldn't dispose of it so quickly. That said, I think we
should definitely explore consolidating VInt with the core classes,
and maybe write an encoder which delegate to them.

Or, come up w/ a different approach for allowing to plug in different
Encoders. I don't rule out anything, as long as we preserve
functionality and capabilities.


On Friday, July 1, 2011, Michael McCandless <> wrote:
> On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <> wrote:
>> Hi,
>> I don't understand the whole discussion here, so please compare these two implementations
and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code
from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed
under GPL):
> This is the source code for a specific version of one specific Java
> impl.  If we knew all Java impls simply implemented the primitive case
> using System.arraycopy (admittedly it's hard to imagine that they
> wouldn't!) then we are fine.
>> This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream):
>>  private void grow(int newLength) {
>>    // It actually should be: (Java 1.7, when its intrinsic on all machines)
>>    // buffer = Arrays.copyOf(buffer, newLength);
>>    byte[] newBuffer = new byte[newLength];
>>    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
>>    buffer = newBuffer;
>>  }
>> So please look at the code, where is a difference that could slow down, except the
Math.min() call which is an intrinsic in almost every JDK on earth?
> Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
> sure about other cases...
>> The problem we are talking about here is only about the generic Object[] copyOf method
and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever
you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced
Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not
willing to replace the reallocations in FST code (Mike you remember, we reverted that on your
GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The
idea was only to replace primitive type code to make it easier readable. And with later JDK
code it could even get faster (not slower), if Oracle starts to add intrinsics for those new
methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general,
if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster
in later JDKs. So we should always prefer Java SDK methods, unless they are slower because
their default impl is too generic or has too much safety checks or uses reflection.
> OK I'm convinced (I think!) that for primitive types only, let's use
> Arrays.copyOf!
>> To come back to UnsafeByteArrayOutputStream:
>> I would change the whole code, as I don’t like the allocation strategy in it (it's
exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow()
and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can
discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods
to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion
here is endless and does not bring us forward].
> Well, it sounds like for primitive types, we can cutover
> ArrayUtils.grow methods.  Then we can look @ the nightly bench the
> next day ;)
> But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
> (almost) a dup of ByteArrayDataOutput?
>> The other thing I don’t like in the new faceting module is duplication of vint
code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream
wrapper for DataOutput everywhere. This would be much more streamlined with all the code we
currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream,
wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class.
> That sounds good!
> Uwe can you commit TODOs to the code w/ these ideas?
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message