crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkw...@gmail.com>
Subject Re: TupleWritable is very slow
Date Fri, 22 Nov 2013 03:11:11 GMT
Looks like Josh had an idea on this in 2011.
https://issues.apache.org/jira/browse/CRUNCH-173


On Thu, Nov 21, 2013 at 8:46 PM, Chao Shi <stepinto@live.com> wrote:

> Hi guys,
>
> I just found TupleWritable is very slow when a huge number of small key
> values are compared in the pipeline. Here is the stacktrace. I've jstack-ed
> a few times and most is running at this method.
>
> I guess the problem that we serialized the full class name *every record*,
> which is costly. I understand the problem is that we don't know the type
> inside tuples at runtime. Do we have any better approaches?
>
> "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable
> [0x00007f3730e3c000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:169)
>         at
> org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
>         at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:120)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>         at org.apache.hadoop.mapred.Child.main(Child.java:264)
>
>
> "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable
> [0x00007f3730e3c000]
>    java.lang.Thread.State: RUNNABLE
>         at java.io.DataInputStream.readByte(DataInputStream.java:248)
>         at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
>         at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
>         at org.apache.hadoop.io.Text.readString(Text.java:400)
>         at
> org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
>         at
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
>         at
> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>         at
> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
>         at
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>         at
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>         at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>         at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:546)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
>         at
> org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
>         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
>         at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
>         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
>         at org.apache.hadoop.mapred.Child.main(Child.java:264)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message