hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Bhattacharjee <rahul.rec....@gmail.com>
Subject Re: Hadoop sampler related query!
Date Tue, 16 Apr 2013 15:45:22 GMT
Mighty users@hadoop

anyone on this.


On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
rahul.rec.dgp@gmail.com> wrote:

> Hi,
>
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
>
> The question here is -
>
> Mapper<K,V,OK,OV>
>
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> mapper.
>
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
>
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
>
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
>
>  Is there is any condition that input sample is to be only used for
> mapper<K,V,K,V1>?
>
>
> Thanks,
> Rahul
>
>

Mime
View raw message