hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wellington Chevreuil <wellington.chevre...@gmail.com>
Subject Re: job taking input file, which "is being" written by its preceding job's map phase
Date Thu, 09 Feb 2012 14:19:34 GMT
Hi Harsh,

I had noticed that this ChainMapper belongs to the old version package
(org.apache.hadoop.mapred instead of org.apache.hadoop.mapreduce).
Although it takes generic Class types as it's method argument, is this
class able to work with Mappers from the new version package
(org.apache.hadoop.mapreduce)?

Thanks,
Wellington.

2012/2/9 Harsh J <harsh@cloudera.com>:
> Vamshi,
>
> What problem are you exactly trying to solve by trying to attempt
> this? If you are only interested in records being streamed from one
> mapper into another, why can't it be chained together? Remember that
> map-only jobs do not sort their data output -- so I still see no
> benefit here in consuming record-by-record from a whole new task when
> it could be done from the very same.
>
> Btw, ChainMapper is an API abstraction to run several mapper
> implementations in sequence (chain) for each record input and
> transform them all along (helpful if you have several utility mappers
> and want to build composites). It does not touch disk.
>
> On Thu, Feb 9, 2012 at 12:15 PM, Vamshi Krishna <vamshi2105@gmail.com> wrote:
>> thank you harsh for your reply. Here what chainMapper does is, once the
>> first mapper finishes, then only second map starts using that file written
>> by first mapper. Its just like chain. But what i want is like pipelining i.e
>> after first map starts and before it finishes only second map has to start
>> and kepp on reading from the same file that is being written by first map.
>> It is almost like produce-consumer like scenario, where first map writes in
>> to the file, and second map keeps on reading the same file. So that
>> pipelining effect is seen between two maps.
>> Hope you got what i am trying to tell..
>>
>> please help..
>>
>>
>> On Wed, Feb 8, 2012 at 12:47 PM, Harsh J <harsh@cloudera.com> wrote:
>>>
>>> Vamsi,
>>>
>>> Is it not possible to express your M-M-R phase chain as a simple, single
>>> M-R?
>>>
>>> Perhaps look at the ChainMapper class @
>>>
>>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/ChainMapper.html
>>>
>>> On Wed, Feb 8, 2012 at 12:28 PM, Vamshi Krishna <vamshi2105@gmail.com>
>>> wrote:
>>> > Hi all
>>> > i have an important question about mapreduce.
>>> >  i have 2 hadoop mapreduce jobs. job1 has only mapper but no reducer.
>>> > Job1
>>> > started and in its map() it is writing to a "file1" using
>>> > context(Arg1,Arg2). If i wanted to start job2 (immidietly after job1) ,
>>> > which should take the "file1" (output still being written by above job's
>>> > map
>>> > phase) as input and do processing in its own map/reduce phases, and job2
>>> > should keep on taking the newly written data to "file1" , untill job1
>>> > finishes, what i should do?
>>> >
>>> > how can i do that, Please can anybody help?
>>> >
>>> > --
>>> > Regards
>>> >
>>> > Vamshi Krishna
>>> >
>>>
>>>
>>>
>>> --
>>> Harsh J
>>> Customer Ops. Engineer
>>> Cloudera | http://tiny.cloudera.com/about
>>
>>
>>
>>
>> --
>> Regards
>>
>> Vamshi Krishna
>>
>
>
>
> --
> Harsh J
> Customer Ops. Engineer
> Cloudera | http://tiny.cloudera.com/about

Mime
View raw message