hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Re: Accumulo and Mapreduce
Date Mon, 04 Mar 2013 16:11:30 GMT
Russell thanks for the link.

I am interested in finding a solution (if out there) where Mapper1 outputs
a custom object and Mapper 2 can use that as input. One way to do this
obviously by writing to Accumulo, in my case. But, is there another
solution for this:

List<MyObject> ----> Input to Job

MyObject ---> Input to Mapper1 (process MyObject) ----> Output <MyObjectId,
MyObject>

<MyObjectId, MyObject> are Input to Mapper2 ... and so on



Ideas?


On Mon, Mar 4, 2013 at 10:00 AM, Russell Jurney <russell.jurney@gmail.com>wrote:

>
> http://svn.apache.org/repos/asf/accumulo/contrib/pig/trunk/src/main/java/org/apache/accumulo/pig/AccumuloStorage.java
>
> AccumuloStorage for Pig comes with Accumulo. Easiest way would be to try
> it.
>
> Russell Jurney http://datasyndrome.com
>
> On Mar 4, 2013, at 5:30 AM, Aji Janis <aji1705@gmail.com> wrote:
>
> Hello,
>
>  I have a MR job design with a flow like this: Mapper1 -> Mapper2 ->
> Mapper3 -> Reducer1. Mapper1's input is an accumulo table. M1's output goes
> to M2.. and so on. Finally the Reducer writes output to Accumulo.
>
> Questions:
>
> 1) Has any one tried something like this before? Are there any workflow
> control apis (in or outside of Hadoop) that can help me set up the job like
> this. Or am I limited to use Quartz for this?
> 2) If both M2 and M3 needed to write some data to two same tables in
> Accumulo, is it possible to do so? Are there any good accumulo mapreduce
> jobs you can point me to? blogs/pages that I can use for reference
> (starting point/best practices).
>
> Thank you in advance for any suggestions!
>
> Aji
>
>

Mime
View raw message