hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Questions about Hadoop
Date Wed, 24 Sep 2008 09:27:00 GMT
Hi,

Arijit Mukherjee wrote:
> Hi
>
> We've been thinking of using Hadoop for a decision making system which
> will analyze telecom-related data from various sources to take certain
> decisions. The data can be huge, of the order of terabytes, and can be
> stored as CSV files, which I understand will fit into Hadoop as Tom
> White mentions in the Rough Cut Guide that Hadoop is well suited for
> records. The question I want to ask is whether it is possible to perform
> statistical analysis on the data using Hadoop and MapReduce. If anyone
> has done such a thing, we'd be very interested to know about it. Is it
> also possible to create a workflow like functionality with MapReduce?
>   
Hadoop can handle TB data sizes, and statistical data analysis is one of 
the
perfect things that fit into the mapreduce computation model. You can check
what people are doing with Hadoop at 
http://wiki.apache.org/hadoop/PoweredBy.
I think the best way to see if your requirements can be met by 
Hadoop/mapreduce is
to read the Mapreduce paper by Dean et.al. Also you might be interested 
in checking out
Mahout, which is a subproject of Lucene. They are doing ML on top of 
Hadoop.

Hadoop is mostly suitable for batch jobs, however these jobs can be 
chained together to
form a workflow.  I will try to be more helpful if you could extend what 
you mean by workflow.

Enis Soztutar

> Regards
> Arijit
>
> Dr. Arijit Mukherjee
> Principal Member of Technical Staff, Level-II
> Connectiva Systems (I) Pvt. Ltd.
> J-2, Block GP, Sector V, Salt Lake
> Kolkata 700 091, India
> Phone: +91 (0)33 23577531/32 x 107
> http://www.connectivasystems.com
>
>
>   


Mime
View raw message