hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabio C." <anyte...@gmail.com>
Subject Re: Prune out data to a specific reduce task
Date Wed, 11 Mar 2015 12:43:39 GMT
As far as I know the code running in each reducer is the same you specify
in your reduce function, so if you know in advance the features of the data
you want to ignore you can just instruct reducers to do so.
If you are able to tell whether or not to keep an entry at the beginning,
you can filter them out within the map function.
I could think of a wordcount example where we tell the map phase to ignore
all the words starting with a specific letter...
What kind of data are you processing and what is the filtering condition?
Anyway I'm sorry I can't help with the actual code, but I'm not really into
this right now.

On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
xeonmailinglist@gmail.com> wrote:

>  Maybe the correct question is, how can I filter data in mapreduce in Java?
>
>
>
> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>
> To exclude data to a specific reducer, should I build a partitioner that
> do this? Should I have a map function that checks to which reduce task the
> output goes?
>
> Can anyone give me some suggestion?
>
> And by the way, I really want to exclude data to a reduce task. So, I will
> run more than 1 reducer, even if one of them does not get input data.
>
>
> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>
> Hi,
>
> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
> excludes data that will go to the reduce task 2. This means that, only
> reducer 1 will produce data, and the other one will be empty, or even it
> doesn't execute.
>
> How can I do this in MapReduce?
>
> [image: Example Job Execution]
>
>
> Thanks,
>
> --
> --
>
>
> --
> --
>
>
> --
> --
>
>

Mime
View raw message