crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chao Shi <stepi...@live.com>
Subject TupleWritable is very slow
Date Fri, 22 Nov 2013 02:46:36 GMT
Hi guys,

I just found TupleWritable is very slow when a huge number of small key
values are compared in the pipeline. Here is the stacktrace. I've jstack-ed
a few times and most is running at this method.

I guess the problem that we serialized the full class name *every record*,
which is costly. I understand the problem is that we don't know the type
inside tuples at runtime. Do we have any better approaches?

"main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable [0x00007f3730e3c000]
   java.lang.Thread.State: RUNNABLE
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:169)
	at org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:120)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)


"main" prio=10 tid=0x00007f372c01b800 nid=0x4342 runnable [0x00007f3730e3c000]
   java.lang.Thread.State: RUNNABLE
	at java.io.DataInputStream.readByte(DataInputStream.java:248)
	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:299)
	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:320)
	at org.apache.hadoop.io.Text.readString(Text.java:400)
	at org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:157)
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:122)
	at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
	at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:144)
	at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
	at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
	at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
	at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:546)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:572)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
	at org.apache.hadoop.mapred.Child.main(Child.java:264)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message