crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: TupleWritable is very slow
Date Fri, 22 Nov 2013 03:18:49 GMT
Yeah, let me see if I can update it for the modern era. :)


On Thu, Nov 21, 2013 at 7:11 PM, Micah Whitacre <mkwhit@gmail.com> wrote:

> Looks like Josh had an idea on this in 2011.
> https://issues.apache.org/jira/browse/CRUNCH-173
>
>
> On Thu, Nov 21, 2013 at 8:46 PM, Chao Shi <stepinto@live.com> wrote:
>
> > Hi guys,
> >
> > I just found TupleWritable is very slow when a huge number of small key
> > values are compared in the pipeline. Here is the stacktrace. I've
> jstack-ed
> > a few times and most is running at this method.
> >
> > I guess the problem that we serialized the full class name *every
> record*,
> > which is costly. I understand the problem is that we don't know the type
> > inside tuples at runtime. Do we have any better approaches?
> >
> > "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable
> > [0x00007f3730e3c000]
> >    java.lang.Thread.State: RUNNABLE
> >         at java.lang.Class.forName0(Native Method)
> >         at java.lang.Class.forName(Class.java:169)
> >         at
> >
> org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
> >         at
> >
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
> >         at
> >
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:120)
> >         at
> > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> >         at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:264)
> >
> >
> > "main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable
> > [0x00007f3730e3c000]
> >    java.lang.Thread.State: RUNNABLE
> >         at java.io.DataInputStream.readByte(DataInputStream.java:248)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
> >         at
> > org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
> >         at org.apache.hadoop.io.Text.readString(Text.java:400)
> >         at
> >
> org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
> >         at
> >
> org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
> >         at
> > org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> >         at
> > org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
> >         at
> > org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
> >         at
> >
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
> >         at
> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> >         at
> org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:546)
> >         at
> >
> org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
> >         at
> > org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
> >         at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
> >         at
> > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
> >         at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
> >         at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at javax.security.auth.Subject.doAs(Subject.java:396)
> >         at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> >         at org.apache.hadoop.mapred.Child.main(Child.java:264)
> >
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message