hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shirley Cohen <sco...@cs.utexas.edu>
Subject Re: data partitioning question
Date Tue, 05 Aug 2008 02:41:19 GMT
Thanks, Qin. It sounds like you're saying that this type of  
partitioning needs its own map-reduce set.

I was hoping it could be done in the InputFormat class :))

Shirley

On Aug 4, 2008, at 2:49 PM, Qin Gao wrote:

> For the first question, I think it is better to do it at reduce stage,
> because the partitioner only consider the size of blocks in bytes.  
> Instead
> you can output the intermediate key/value pair as this:
>
> key: 1 if C=1,3,5,7.     0 otherwise
> value: the tuple.
>
> In reducer you can have a reducer deal with all the key with  
> c=1,3,5,7.
>
> On Mon, Aug 4, 2008 at 3:29 PM, Shirley Cohen  
> <scohen@cs.utexas.edu> wrote:
>
>> Hi,
>>
>> I want to implement some data partitioning logic where a mapper is  
>> assigned
>> a specific range of values. Here is a concrete example of what I  
>> have in
>> mind:
>>
>> Suppose I have attributes A, B, C and the following tuples:
>>
>> (A, B, C)
>> (1, 3, 1)
>> (1, 2, 2)
>> (1, 2, 3)
>> (12, 3, 4)
>> (12, 2, 5)
>> (12, 8, 6)
>> (12,  2, 7)
>>
>> What I want to do is assign mapper x all the tuples where the C  
>> attribute =
>> 1, 3, 5, and 7.
>>
>> 1-Is it possible to write a smart InputFormat class that can  
>> assign a set
>> of records to a specific mapper? If so, how?
>> 2-How will this type of partitioning logic interact with HDFS data
>> locality?
>>
>>
>> Thanks,
>>
>> Shirley
>>
>>


Mime
View raw message