lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
Date Fri, 01 Jul 2011 13:13:13 GMT
On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler <uwe@thetaphi.de> wrote:
> Hi,
>
> I don't understand the whole discussion here, so please compare these two implementations
and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code
from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed
under GPL):

This is the source code for a specific version of one specific Java
impl.  If we knew all Java impls simply implemented the primitive case
using System.arraycopy (admittedly it's hard to imagine that they
wouldn't!) then we are fine.

> This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream):
>
>  private void grow(int newLength) {
>    // It actually should be: (Java 1.7, when its intrinsic on all machines)
>    // buffer = Arrays.copyOf(buffer, newLength);
>    byte[] newBuffer = new byte[newLength];
>    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
>    buffer = newBuffer;
>  }
>
> So please look at the code, where is a difference that could slow down, except the Math.min()
call which is an intrinsic in almost every JDK on earth?

Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
sure about other cases...

> The problem we are talking about here is only about the generic Object[] copyOf method
and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever
you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced
Collections.sort() by CollectionUtil.sort, that does no array copy. Simon & me were not
willing to replace the reallocations in FST code (Mike you remember, we reverted that on your
GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The
idea was only to replace primitive type code to make it easier readable. And with later JDK
code it could even get faster (not slower), if Oracle starts to add intrinsics for those new
methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general,
if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster
in later JDKs. So we should always prefer Java SDK methods, unless they are slower because
their default impl is too generic or has too much safety checks or uses reflection.

OK I'm convinced (I think!) that for primitive types only, let's use
Arrays.copyOf!

> To come back to UnsafeByteArrayOutputStream:
>
> I would change the whole code, as I don’t like the allocation strategy in it (it's
exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow()
and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can
discuss about this problem again when Simon & me wants to change ArrayUtils.grow methods
to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion
here is endless and does not bring us forward].

Well, it sounds like for primitive types, we can cutover
ArrayUtils.grow methods.  Then we can look @ the nightly bench the
next day ;)

But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
(almost) a dup of ByteArrayDataOutput?

> The other thing I don’t like in the new faceting module is duplication of vint code.
Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper
for DataOutput everywhere. This would be much more streamlined with all the code we currently
have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream,
wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class.

That sounds good!

Uwe can you commit TODOs to the code w/ these ideas?

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message