crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-329) Re-add type info to TupleWritable to make fields sort correctly
Date Mon, 17 Feb 2014 06:24:19 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902993#comment-13902993
] 

Chao Shi commented on CRUNCH-329:
---------------------------------

an exmaple stacktrace:

{code}
"SpillThread" daemon prio=10 tid=0x00007f1db4e62800 nid=0x1f97 runnable [0x00007f1dab5e1000]
   java.lang.Thread.State: RUNNABLE
	at java.io.DataInputStream.readInt(DataInputStream.java:372)
	at org.apache.hadoop.io.BytesWritable.readFields(BytesWritable.java:153)
	at org.apache.crunch.types.writable.TupleWritable.readFields(TupleWritable.java:171)
	at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:125)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:968)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
	at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:122)
	at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1254)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:712)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1199)
{code}

> Re-add type info to TupleWritable to make fields sort correctly
> ---------------------------------------------------------------
>
>                 Key: CRUNCH-329
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-329
>             Project: Crunch
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.10.0, 0.8.3
>            Reporter: Josh Wills
>            Assignee: Josh Wills
>             Fix For: 0.10.0, 0.8.3
>
>         Attachments: CRUNCH-329.patch, CRUNCH-329b.patch, fix-ss-writables.patch
>
>
> Secondary sorts aren't currently working correctly for Writable types after we hacked
the TupleWritable impl to make all of the fields BytesWritables (e.g., secondary IntWritable
values will no longer be sorted correctly, even though everything is still grouped correctly.)
> The least-bad way that I came up with to fix this is to use integer codes for each possible
WritableComparable type in a pipeline that we can use to decode what Writable type each tuple
field corresponds to. This allows us to keep the various fields sortable while still doing
a reasonable job of minimizing the serialization required to pass the type information along.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message