Return-Path: X-Original-To: apmail-crunch-dev-archive@www.apache.org Delivered-To: apmail-crunch-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 584CD10743 for ; Fri, 22 Nov 2013 02:47:28 +0000 (UTC) Received: (qmail 86295 invoked by uid 500); 22 Nov 2013 02:47:27 -0000 Delivered-To: apmail-crunch-dev-archive@crunch.apache.org Received: (qmail 86269 invoked by uid 500); 22 Nov 2013 02:47:27 -0000 Mailing-List: contact dev-help@crunch.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@crunch.apache.org Delivered-To: mailing list dev@crunch.apache.org Received: (qmail 86261 invoked by uid 99); 22 Nov 2013 02:47:27 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 02:47:27 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10 tests=HTML_MESSAGE,MSGID_FROM_MTA_HEADER,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of stepinto@live.com designates 65.55.116.33 as permitted sender) Received: from [65.55.116.33] (HELO blu0-omc1-s22.blu0.hotmail.com) (65.55.116.33) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Nov 2013 02:47:18 +0000 Received: from BLU0-SMTP258 ([65.55.116.7]) by blu0-omc1-s22.blu0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Thu, 21 Nov 2013 18:46:58 -0800 X-TMN: [+kKCE//b2dfXgw8o/wWnjWXz9fMj0cet] X-Originating-Email: [stepinto@live.com] Message-ID: Received: from mail-wi0-f173.google.com ([209.85.212.173]) by BLU0-SMTP258.phx.gbl over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Thu, 21 Nov 2013 18:46:57 -0800 Received: by mail-wi0-f173.google.com with SMTP id hm4so126530wib.0 for ; Thu, 21 Nov 2013 18:46:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=mime-version:from:date:message-id:subject:to:content-type; bh=Fh0NCCKFleq1DaO+U8FQ/wQN7ZV5/7NxK0G1q3sSDEE=; b=Jdl0rYuxPrNP6Gs/AbG8qgEqnmiVGJOiNf/J5QVNRscoCNnD1i4Jm12QapXC/dCJ39 mR6xYRj6pbUBnOuPOjTedYMxb/+ziqQDlGk05Y2wT2aCOJVGgTWXA/cUK5vAzS9WOp6i w/KiH/aL+xg02DSQPPtgae2rweQ981OJbuUyP/ecstCo4aKgfljnGoKFMHhIkr2lpwVc fHOLXrdPyB8WhfD+Y5k8GgPrJTzT2GbkUrnWYmFFWTlXX6eW7xSMdPs/5f/j5BBxxvcc jx7RB7MCj67BbD8E+ZqA8MsD8Ay2FaalHGCRGoqxMm2kzxsrbBOxHwHmysAKnI2++Gco DDJg== X-Received: by 10.194.5.7 with SMTP id o7mr8204090wjo.17.1385088416298; Thu, 21 Nov 2013 18:46:56 -0800 (PST) MIME-Version: 1.0 Received: by 10.194.243.5 with HTTP; Thu, 21 Nov 2013 18:46:36 -0800 (PST) From: Chao Shi Date: Fri, 22 Nov 2013 10:46:36 +0800 Subject: TupleWritable is very slow To: crunch-dev@apache.org Content-Type: multipart/alternative; boundary="047d7b5d8d45093ec404ebbb072d" X-OriginalArrivalTime: 22 Nov 2013 02:46:57.0638 (UTC) FILETIME=[1C670060:01CEE72D] X-Virus-Checked: Checked by ClamAV on apache.org --047d7b5d8d45093ec404ebbb072d Content-Type: text/plain; charset="ISO-8859-1" Hi guys, I just found TupleWritable is very slow when a huge number of small key values are compared in the pipeline. Here is the stacktrace. I've jstack-ed a few times and most is running at this method. I guess the problem that we serialized the full class name *every record*, which is costly. I understand the problem is that we don't know the type inside tuples at runtime. Do we have any better approaches? "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable [0x00007f3730e3c000] java.lang.Thread.State: RUNNABLE at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:120) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable [0x00007f3730e3c000] java.lang.Thread.State: RUNNABLE at java.io.DataInputStream.readByte(DataInputStream.java:248) at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299) at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320) at org.apache.hadoop.io.Text.readString(Text.java:400) at org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:546) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) --047d7b5d8d45093ec404ebbb072d--