hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-334) Sorting on fields of type double does not work
Date Fri, 25 Jul 2008 22:41:31 GMT

     [ https://issues.apache.org/jira/browse/PIG-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alan Gates updated PIG-334:

    Attachment: doublesort.patch

This takes the file from HADOOP-3061 and adds it to the pig data so we can use a double as
a key.

I also, at Pi's request, moved the hadoop->pig data type translation functions from data.DataType
to backend.hadoop.HDataType.

This does not however fully resolve the sorting issue.  Sorting on any type of declared type

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BytesWritable,
recieved org.apache.pig.backend.hadoop.DoubleWritable
        at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:122)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:157)

>From looking at the explain plan, it looks like the project schema for the local rearrange
is set to bytearray instead of the
correct type.

> Sorting on fields of type double does not work
> ----------------------------------------------
>                 Key: PIG-334
>                 URL: https://issues.apache.org/jira/browse/PIG-334
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: types_branch
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>            Priority: Critical
>             Fix For: types_branch
>         Attachments: doublesort.patch
> In the new pipeline, when possible, pig uses hadoop writable comparable types for the
hadoop key rather than tuple.  As of hadoop 0.17 there is no DoubleWritable type.  It has
been added for hadoop 0.18.  But it appears that we will be ready to integrate the types branch
back into trunk before hadoop 0.18 is released.  So we need to implement a DoubleWritable
for ourselves until that time.
> The code can be taken from HADOOP-3061.  The code where we convert to and from hadoop
types (DataType.getWritableComparableTypes and convertToPigType) needs to be changed to use
this type.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message