hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Srivastava <Ajay.Srivast...@guavus.com>
Subject Re: How to split a sequence file
Date Wed, 12 Sep 2012 05:35:51 GMT
Hi Jason,
I am wondering about use case of distributing records on the basis of key to mapper. If possible,
could you please share your scenario ?
Is it map only job ? Why not distribute records using partitioner and do the processing in
reducers ?

Ajay Srivastava 

On 12-Sep-2012, at 8:45 AM, Jason Yang wrote:

> Hi, 
> I have a sequence file written by SequenceFileOutputFormat with key/value type of <Text,
BytesWritable>, like below:
> Text                             BytesWritable
> -------------------------------------------------------------
> id_A_01  7F2B3C687F2B3C687F2B3C68
> id_A_02  2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7
> id_A_03  5F2B3C68D77F2B3C687F2B3A
> ...
> id_B_01  1AB23C68D73C68D76AB23C68D73C68D7
> id_B_02  5AB23C68D73C68D76AB68D76A1
> id_B_03  F2B23C68D7B23C68D7B23C68D7
> If I want all the records with the same key prefix to be processed by a same mapper,
say records with key id_A_XX are processed by a mapper and records with key id_B_XX are processed
by another mapper, what should I do?  
> Should I implement our own InputFormat inherited from SequenceFileInputFormat ?
> Any help would be appreciated.
> -- 
> YANG, Lin

View raw message