hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sandy Ryza <sandy.r...@cloudera.com>
Subject Re: Accumulo and Mapreduce
Date Mon, 04 Mar 2013 18:14:56 GMT
Hi Aji,

Oozie is a mature project for managing MapReduce workflows.
http://oozie.apache.org/

-Sandy


On Mon, Mar 4, 2013 at 8:17 AM, Justin Woody <justin.woody@gmail.com> wrote:

> Aji,
>
> Why don't you just chain the jobs together?
> http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
>
> Justin
>
> On Mon, Mar 4, 2013 at 11:11 AM, Aji Janis <aji1705@gmail.com> wrote:
> > Russell thanks for the link.
> >
> > I am interested in finding a solution (if out there) where Mapper1
> outputs a
> > custom object and Mapper 2 can use that as input. One way to do this
> > obviously by writing to Accumulo, in my case. But, is there another
> solution
> > for this:
> >
> > List<MyObject> ----> Input to Job
> >
> > MyObject ---> Input to Mapper1 (process MyObject) ----> Output
> <MyObjectId,
> > MyObject>
> >
> > <MyObjectId, MyObject> are Input to Mapper2 ... and so on
> >
> >
> >
> > Ideas?
> >
> >
> > On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <
> russell.jurney@gmail.com>
> > wrote:
> >>
> >>
> >>
> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
> >>
> >> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try
> >> it.
> >>
> >> Russell Jurney http://datasyndrome.com
> >>
> >> On Mar 4, 2013, at 5:30 AM, Aji Janis <aji1705@gmail.com> wrote:
> >>
> >> Hello,
> >>
> >>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
> >> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output
> goes
> >> to M2.. and so on. Finally the Reducer writes output to Accumulo.
> >>
> >> Questions:
> >>
> >> 1) Has any one tried something like this before? Are there any workflow
> >> control apis (in or outside of Hadoop) that can help me set up the job
> like
> >> this. Or am I limited to use Quartz for this?
> >> 2) If both M2 and M3 needed to write some data to two same tables in
> >> Accumulo, is it possible to do so? Are there any good accumulo mapreduce
> >> jobs you can point me to? blogs/pages that I can use for reference
> (starting
> >> point/best practices).
> >>
> >> Thank you in advance for any suggestions!
> >>
> >> Aji
> >>
> >
>

Mime
View raw message