hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drake민영근 <drake....@nexr.com>
Subject Re: Prune out data to a specific reduce task
Date Thu, 12 Mar 2015 06:36:12 GMT
In the map method, records would be ignored with no output.collect() or
context.write().

Or you just delete output file from reducer 2 at the end of job. the
reducer 2's result file is "part-r-00002".

Drake 민영근 Ph.D
kt NexR

On Wed, Mar 11, 2015 at 9:43 PM, Fabio C. <anytek88@gmail.com> wrote:

> As far as I know the code running in each reducer is the same you specify
> in your reduce function, so if you know in advance the features of the data
> you want to ignore you can just instruct reducers to do so.
> If you are able to tell whether or not to keep an entry at the beginning,
> you can filter them out within the map function.
> I could think of a wordcount example where we tell the map phase to ignore
> all the words starting with a specific letter...
> What kind of data are you processing and what is the filtering condition?
> Anyway I'm sorry I can't help with the actual code, but I'm not really
> into this right now.
>
> On Wed, Mar 11, 2015 at 12:13 PM, xeonmailinglist-gmail <
> xeonmailinglist@gmail.com> wrote:
>
>>  Maybe the correct question is, how can I filter data in mapreduce in
>> Java?
>>
>>
>>
>> On 11-03-2015 10:36, xeonmailinglist-gmail wrote:
>>
>> To exclude data to a specific reducer, should I build a partitioner that
>> do this? Should I have a map function that checks to which reduce task the
>> output goes?
>>
>> Can anyone give me some suggestion?
>>
>> And by the way, I really want to exclude data to a reduce task. So, I
>> will run more than 1 reducer, even if one of them does not get input data.
>>
>>
>> On 11-03-2015 10:28, xeonmailinglist-gmail wrote:
>>
>> Hi,
>>
>> I have this job that has 3 map tasks and 2 reduce tasks. But, I want to
>> excludes data that will go to the reduce task 2. This means that, only
>> reducer 1 will produce data, and the other one will be empty, or even it
>> doesn't execute.
>>
>> How can I do this in MapReduce?
>>
>> [image: Example Job Execution]
>>
>>
>> Thanks,
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>> --
>> --
>>
>>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message