Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 268D0488D for ; Fri, 1 Jul 2011 06:52:15 +0000 (UTC) Received: (qmail 88773 invoked by uid 500); 1 Jul 2011 06:34:49 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 86874 invoked by uid 500); 1 Jul 2011 06:33:59 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 85921 invoked by uid 99); 1 Jul 2011 06:33:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 06:33:55 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [85.25.71.29] (HELO mail.troja.net) (85.25.71.29) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2011 06:33:49 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.troja.net (Postfix) with ESMTP id 5435F45F45A; Fri, 1 Jul 2011 08:33:26 +0200 (CEST) X-Virus-Scanned: Debian amavisd-new at mail.troja.net Received: from mail.troja.net ([127.0.0.1]) by localhost (megaira.troja.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id u-J1s+ihliQj; Fri, 1 Jul 2011 08:33:18 +0200 (CEST) Received: from VEGA (port-92-196-126-42.dynamic.qsc.de [92.196.126.42]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by mail.troja.net (Postfix) with ESMTPSA id D068F45ED51; Fri, 1 Jul 2011 08:33:14 +0200 (CEST) From: "Uwe Schindler" To: , References: <20110630124217.92CD923888FD@eris.apache.org>

In-Reply-To: Subject: RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java Date: Fri, 1 Jul 2011 08:33:39 +0200 Message-ID: <00cc01cc37b8$d2e93230$78bb9690$@thetaphi.de> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQI4yX09VmArTNYykmFlUSqZWirdtgIBPugPAakSttADQlgzWwJS1xzDAcpROh8BbCXeZQLafUHaAcZFaMABmvSuKpNoS6Yw Content-Language: de Hi, I don't understand the whole discussion here, so please compare these = two implementations and tell me which one is faster. Please don't hurt = me, if you don't want to see src.jar code from OpenJDK Java6 - just = delete this mail if you don=E2=80=99t want to (the code here is licensed = under GPL): Arrays.java: /** * Copies the specified array, truncating or padding with zeros (if = necessary) * so the copy has the specified length. For all indices that are * valid in both the original array and the copy, the two arrays = will * contain identical values. For any indices that are valid in the * copy but not the original, the copy will contain = (byte)0. * Such indices will exist if and only if the specified length * is greater than that of the original array. * * @param original the array to be copied * @param newLength the length of the copy to be returned * @return a copy of the original array, truncated or padded with = zeros * to obtain the specified length * @throws NegativeArraySizeException if newLength is = negative * @throws NullPointerException if original is null * @since 1.6 */ public static byte[] copyOf(byte[] original, int newLength) { byte[] copy =3D new byte[newLength]; System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength)); return copy; } This is our implementation, simon replaced and Robert reverted = (UnsafeByteArrayOutputStream): private void grow(int newLength) { // It actually should be: (Java 1.7, when its intrinsic on all = machines) // buffer =3D Arrays.copyOf(buffer, newLength); byte[] newBuffer =3D new byte[newLength]; System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); buffer =3D newBuffer; } So please look at the code, where is a difference that could slow down, = except the Math.min() call which is an intrinsic in almost every JDK on = earth? The problem we are talking about here is only about the generic Object[] = copyOf method and also affects e.g. *all* Collection.toArray() methods - = they all use this code, so whenever you use ArrayList.toArray() or = similar, the slow code is executed. This is why we replaced = Collections.sort() by CollectionUtil.sort, that does no array copy. = Simon & me were not willing to replace the reallocations in FST code = (Mike you remember, we reverted that on your GIT repo when we did perf = tests) and other parts in Lucene (there are only few of them). The idea = was only to replace primitive type code to make it easier readable. And = with later JDK code it could even get faster (not slower), if Oracle = starts to add intrinsics for those new methods (and that=E2=80=99s Dawid = and mine reason to change to copyTo for primitive types). In general, if = you use Java SDK methods, that are as fast as ours, they always have a = chance to get faster in later JDKs. So we should always prefer Java SDK = methods, unless they are slower because their default impl is too = generic or has too much safety checks or uses reflection. To come back to UnsafeByteArrayOutputStream: I would change the whole code, as I don=E2=80=99t like the allocation = strategy in it (it's exponential, on every grow it doubles its size). We = should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), = to have a similar allocation strategy like in trunk. Then we can discuss = about this problem again when Simon & me wants to change ArrayUtils.grow = methods to use Arrays.copyOf... *g* [just joking, I will never ask = again, because this discussion here is endless and does not bring us = forward]. The other thing I don=E2=80=99t like in the new faceting module is = duplication of vint code. Why don=E2=80=99t we change it to use = DataInput/DataOutput and use Dawid's new In/OutStream wrapper for = DataOutput everywhere. This would be much more streamlined with all the = code we currently have. Then we can encode the payloads (or later = docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a = OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput = class. Uwe (getting crazy) ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: uwe@thetaphi.de > -----Original Message----- > From: Simon Willnauer [mailto:simon.willnauer@googlemail.com] > Sent: Friday, July 01, 2011 7:47 AM > To: Michael McCandless > Cc: dev@lucene.apache.org; Dawid Weiss > Subject: Re: svn commit: r1141510 - > /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe > ByteArrayOutputStream.java >=20 > On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless > wrote: > > On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer > > wrote: > >> On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss > >> wrote: > >>>> I don't seen any evidence that this is any slower though. > >>> > >>> You need to run with -client (if the machine is a beast this is > >>> tricky because x64 will pick -server regardless of the = command-line > >>> setting) and you need to be copying generic arrays. I think this = can > >>> be shown > >>> -- a caliper benchmark would be perfect to demonstrate this in > >>> isolation; I may write one if I find a spare moment. > >> > >> this is what I want to see. I don't want to discuss based on some = bug > >> reported for a non-primitive version of copyOf thats all. > >> its pointless to discuss if there is no evidence which I don't see. = I > >> am happy with arraycopy I would just have appreciated a discussion > >> before backing the change out. > > > > I think the burden of proof here is on Arrays.copyOf. > > > > Ie, until we can prove (through benchmarking in different envs) that > > it can be trusted, we should just stick with System.arraycopy to > > reduce the risk. >=20 > Mike I think we should do that but the real issue here is what if = somebody > comes up with any arbitrary method in the future claiming its slow we = back > out and use the "we think safe way" what if it is actually the other = way > around and copyOf is optimized by new VMs and the copyarray is = slightly > slower. If robert would not have reverted this commit we would have = not > discussed that her at all. I just want to prevent the "we should not = do this" it > might be a problem in the big picture while the microbenchmark doesn't > show a difference. At some point we have to rely on the JVM. > Even if we benchmark on all OS with various JVMs we can't prevent jvm = bug > based perf hits. While there was no bug reported for primitives here = we > don't have to be afraid of it either. I don't think its saver to use = arraycopy at > all its just a level of indirection safer but makes development more = of a pain > IMO. >=20 > Just my $0.05 > > > > Mike > > >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For = additional > commands, e-mail: dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org