hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Qin Gao" <q...@cs.cmu.edu>
Subject Re: data partitioning question
Date Mon, 04 Aug 2008 19:49:38 GMT
For the first question, I think it is better to do it at reduce stage,
because the partitioner only consider the size of blocks in bytes. Instead
you can output the intermediate key/value pair as this:

key: 1 if C=1,3,5,7.     0 otherwise
value: the tuple.

In reducer you can have a reducer deal with all the key with c=1,3,5,7.

On Mon, Aug 4, 2008 at 3:29 PM, Shirley Cohen <scohen@cs.utexas.edu> wrote:

> Hi,
>
> I want to implement some data partitioning logic where a mapper is assigned
> a specific range of values. Here is a concrete example of what I have in
> mind:
>
> Suppose I have attributes A, B, C and the following tuples:
>
> (A, B, C)
> (1, 3, 1)
> (1, 2, 2)
> (1, 2, 3)
> (12, 3, 4)
> (12, 2, 5)
> (12, 8, 6)
> (12,  2, 7)
>
> What I want to do is assign mapper x all the tuples where the C attribute =
> 1, 3, 5, and 7.
>
> 1-Is it possible to write a smart InputFormat class that can assign a set
> of records to a specific mapper? If so, how?
> 2-How will this type of partitioning logic interact with HDFS data
> locality?
>
>
> Thanks,
>
> Shirley
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message