hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Accumulo and Mapreduce
Date Mon, 04 Mar 2013 18:52:10 GMT
You can chain MR jobs with Oozie, but would suggest using Cascading, Pig or
Hive. You can do this is a couple lines of code, I suspect. Two map reduce
jobs should not pose any kind of challenge with the right tools.

On Monday, March 4, 2013, Sandy Ryza wrote:

> Hi Aji,
>
> Oozie is a mature project for managing MapReduce workflows.
> http://oozie.apache.org/
>
> -Sandy
>
>
> On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody <justin.woody@gmail.com<javascript:_e({},
'cvml', 'justin.woody@gmail.com');>
> > wrote:
>
>> Aji,
>>
>> Why don't you just chain the jobs together?
>> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
>>
>> Justin
>>
>> On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis <aji1705@gmail.com<javascript:_e({},
'cvml', 'aji1705@gmail.com');>>
>> wrote:
>> > Russell thanks for the link.
>> >
>> > I am interested in finding a solution (if out there) where Mapper1
>> outputs a
>> > custom object and Mapper 2 can use that as input. One way to do this
>> > obviously by writing to Accumulo, in my case. But, is there another
>> solution
>> > for this:
>> >
>> > List<MyObject> ----> Input to Job
>> >
>> > MyObject ---> Input to Mapper1 (process MyObject) ----> Output
>> <MyObjectId,
>> > MyObject>
>> >
>> > <MyObjectId, MyObject> are Input to Mapper2 ... and so on
>> >
>> >
>> >
>> > Ideas?
>> >
>> >
>> > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <
>> russell.jurney@gmail.com <javascript:_e({}, 'cvml',
>> 'russell.jurney@gmail.com');>>
>> > wrote:
>> >>
>> >>
>> >>
>> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
>> >>
>> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to
>> try
>> >> it.
>> >>
>> >> Russell Jurney http://datasyndrome.com
>> >>
>> >> On Mar 4, 2013, at 5:30 AM, Aji Janis <aji1705@gmail.com<javascript:_e({},
'cvml', 'aji1705@gmail.com');>>
>> wrote:
>> >>
>> >> Hello,
>> >>
>> >>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
>> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output
>> goes
>> >> to M2.. and so on. Finally the Reducer writes output to Accumulo.
>> >>
>> >> Questions:
>> >>
>> >> 1) Has any one tried something like this before? Are there any workflow
>> >> control apis (in or outside of Hadoop) that can help me set up the job
>> like
>> >> this. Or am I limited to use Quartz for this?
>> >> 2) If both M2 and M3 needed to write some data to two same tables in
>> >> Accumulo, is it possible to do so? Are there any good accumulo
>> mapreduce
>> >> jobs you can point me to? blogs/pages that I can use for reference
>> (starting
>> >> point/best practices).
>> >>
>> >> Thank you in advance for any suggestions!
>> >>
>> >> Aji
>> >>
>> >
>>
>
>

-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Mime
View raw message