hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Beech (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7024) TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass and outputValueClass
Date Wed, 24 Oct 2012 09:12:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13483096#comment-13483096
] 

Dave Beech commented on HBASE-7024:
-----------------------------------

Thanks Ted, Stack. 

Stack - you are right that keys and values have to be serializable, but they don't have to
be Serializable in the Java interface sense. The Job/JobConf classes in Hadoop accept absolutely
any class. Map tasks use Hadoop's SerializationFactory to work out which serializer class
to use (WritableSerialization is the default, but you can specify custom ones through the
io.serialization job setting, like AvroSerialization)

The point is that Hadoop doesn't care at all what type your map output key and value classes
are, so long as you have provided a serializer which works with them. If you haven't, the
job dies horribly (no surprise there).

I haven't tested with Hadoop 2 yet, no, but I'd be very surprised if this patch broke anything.
If they'd changed this behaviour in Hadoop I'm sure there'd be tons of regression problems
with mapreduce jobs that need custom serializers.  

                
> TableMapReduceUtil.initTableMapperJob unnecessarily limits the types of outputKeyClass
and outputValueClass
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7024
>                 URL: https://issues.apache.org/jira/browse/HBASE-7024
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Dave Beech
>            Priority: Minor
>         Attachments: HBASE-7024.patch
>
>
> The various initTableMapperJob methods in TableMapReduceUtil take outputKeyClass and
outputValueClass parameters which need to extend WritableComparable and Writable respectively.

> Because of this, it is not convenient to use an alternative serialization like Avro.
(I wanted to set these parameters to AvroKey and AvroValue). 
> The methods in the MapReduce API to set map output key and value types do not impose
this restriction, so is there a reason to do it here?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message