hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Sreekumar <hsreeku...@clickable.com>
Subject Re: Question regarding a System good candidate for Hadoop?
Date Mon, 03 Jan 2011 04:33:32 GMT
> This tax calculating system as a pre process which reads all the data from
> the
> database and creates comma separated flat files. These files are then pass
> through a map of jobs (series of jobs) where some intermediate output is
> generated which is feed to the next job in the map. Finally all the
> generated
> tax data is updated back into the database into different tables. In this
> predefined map, some of the independent processes run in parallel.
> Currently
> this system runs on a single machine with 64 cores and does not fully
> utilize
> the distributed parallel processing framework which I am hoping Hadoop
> will solve it.

As Harsh said, from the looks of it, it seems to be a great candidate for MR

They are definitely not easily changeable to Map reduce :(. Since the
> current
> logic has multiple jobs (in the map) to produce intermediate outputs (for
> the
> next job in line), I do not think a single Map step will be able to produce
> the
> required output that is currently produced by say 10 processes in the map.
> Can I feed 1 map output to the next map input? or 1 Map step can have
> multiple
> stages to come to the right output?

>From your description, I get a feeling that they are, in fact, easily
changeable to MR jobs.

In either cases, will I be able to use my existing C jobs logic? because
> currently their individual outputs are not similar to what Map steps
> generates?
> I would also appreciate if any one has links to any case studies done which
> I
> can go over to learn more about real world project got converted to Hadoop.

The official documentation is a good place to start. The cloudera website
has a lot of useful articles. The hadoop
book<http://oreilly.com/catalog/0636920010388> is
great too.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message