hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aji Janis <aji1...@gmail.com>
Subject Java jars and MapReduce
Date Fri, 01 Mar 2013 19:48:25 GMT

Current Design: I have a java object MyObjectA. MyObjectA goes through
Three processors (jars) that are run in sequence and do a lot of processing
to beef up A with tons of additional stuff (think ETL) and the final result
is MyObjectD (note: MyObjectD is really A with more fields if you will
added to it but I wanted to clarify here that they are very different).
MyObjectD when ready is saved to my non relational database (accumulo).
Currently, all this is done by making use of Quartz Scheduler - a
List<MyObjectA> is submitted for processing every N mintues. Everything is
written in Java and there is a lot of talking back n forth with Accumulo
(to access tables that will help convert A to D).

We split the processing into three processors just because it was more
convenient and we didn't want everything rolled up in one processor. Having
said that I can definitely merge the three into ONE processor. But my
question is, what are all the things (obviously generically speaking) I
need to be concerned about/ look into to make this a map reduce job? I am
asking for pointers on where to even start here.

Lets say, all my processing is done in mappers. So my input will be
MyObjectA and my output will be MyObjectD from each mapper. And then my
reducer simple writes all MyObjectD objects to accumulo. Is achieving this
as easy as just submitting the jar to hadoop ????

I guess overall, I want to know how does one go about blindly submitting a
.jar (java apps) and make this a map reduce task.
We are going this route, because multi-threading won't solve our problem.
We have to process objects in batch now and they are coming in every

Thank you in advance for any and all help.

View raw message