Return-Path: X-Original-To: apmail-avro-dev-archive@www.apache.org Delivered-To: apmail-avro-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 59FBD109D2 for ; Tue, 7 Jan 2014 00:28:52 +0000 (UTC) Received: (qmail 92844 invoked by uid 500); 7 Jan 2014 00:28:51 -0000 Delivered-To: apmail-avro-dev-archive@avro.apache.org Received: (qmail 92761 invoked by uid 500); 7 Jan 2014 00:28:51 -0000 Mailing-List: contact dev-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@avro.apache.org Delivered-To: mailing list dev@avro.apache.org Received: (qmail 92703 invoked by uid 99); 7 Jan 2014 00:28:51 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 07 Jan 2014 00:28:51 +0000 Date: Tue, 7 Jan 2014 00:28:51 +0000 (UTC) From: "Doug Cutting (JIRA)" To: dev@avro.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AVRO-1428) Schema.computeHash() to add if check to avoid unnecessary hashcode computation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AVRO-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863703#comment-13863703 ] Doug Cutting commented on AVRO-1428: ------------------------------------ Your patch does not apply cleanly. Can you please provide a proper patch file, generated with 'svn diff'? Thanks! Also, can you please provide your benchmark program? > Schema.computeHash() to add if check to avoid unnecessary hashcode computation > ------------------------------------------------------------------------------ > > Key: AVRO-1428 > URL: https://issues.apache.org/jira/browse/AVRO-1428 > Project: Avro > Issue Type: Improvement > Components: java > Reporter: Tie Liu > Attachments: AVRO-1428.patch > > > In current Schma.java we have following implementation: > public final int hashCode() { > if (hashCode == NO_HASHCODE) > hashCode = computeHash(); > return hashCode; > } > int computeHash() { return getType().hashCode() + props.hashCode(); } > While hashCode is doing the checking of "if (hashCode == NO_HASHCODE)", the computeHash method is not. But the computeHash method is being called from Schema$Field.hashCode and the subclasses hashCode implementation like following: > public int hashCode() { return name.hashCode() + schema.computeHash(); } //this is from Schema$Field class > This is causing the the calculation of hashCode getting called unnecessarily extensively. The proposed changed is to add the "if" check inside the computeHash method instead: > int computeHash() > { > if (hashCode == NO_HASHCODE) > { > hashCode = getType().hashCode() + props.hashCode(); > } > return hashCode; > } > We did a simple test to compare the performance difference, below is a summary of the heap snapshot of comparing the difference: > As a test I wrote a small program that creates a HashMap() and enters a loop simply identifying whether various Schema.Field instances are keys in the map. Obviously this is a pathological test case, but when running with the current implementation of Schema.Field it has (in about 30 seconds) used up nearly 8 GBytes of heap in instantiating intermediate objects associated with calling Schema.computeHash(): > Heap > PSYoungGen total 17432576K, used 8666481K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000) > eden space 14942208K, 58% used [0x0000000340000000,0x0000000550f5c650,0x00000006d0000000) > from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000) > to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000) > ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000) > object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000) > PSPermGen total 21504K, used 5782K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000) > object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a5818,0x00000002fc300000) > When running with the modified implementation (and no other change) all the object allocation vanishes: > Heap > PSYoungGen total 17432576K, used 896532K [0x0000000340000000, 0x0000000800000000, 0x0000000800000000) > eden space 14942208K, 6% used [0x0000000340000000,0x0000000376b852d0,0x00000006d0000000) > from space 2490368K, 0% used [0x0000000768000000,0x0000000768000000,0x0000000800000000) > to space 2490368K, 0% used [0x00000006d0000000,0x00000006d0000000,0x0000000768000000) > ParOldGen total 1048576K, used 0K [0x0000000300000000, 0x0000000340000000, 0x0000000340000000) > object space 1048576K, 0% used [0x0000000300000000,0x0000000300000000,0x0000000340000000) > PSPermGen total 21504K, used 5768K [0x00000002fae00000, 0x00000002fc300000, 0x0000000300000000) > object space 21504K, 26% used [0x00000002fae00000,0x00000002fb3a2240,0x00000002fc300000) > As a side-effect the test runs x3 faster with the modified hashCode() implementation. -- This message was sent by Atlassian JIRA (v6.1.5#6160)