hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: Questions about Hadoop
Date Wed, 24 Sep 2008 11:22:45 GMT


Arijit Mukherjee wrote:
> Thanx Enis.
>
> By workflow, I was trying to mean something like a chain of MapReduce
> jobs - the first one will extract a certain amount of data from the
> original set and do some computation resulting in a smaller summary,
> which will then be the input to a further MR job, and so on...somewhat
> similar to a workflow as in the SOA world.
>
>   
Yes, you can always chain job together to form a final summary. 
o.a.h.mapred.jobcontrol.JobControl might be interesting for you.
> Is it possible to use statistical analysis tools such as R (or say PL/R)
> within MapReduce on Hadoop? As far as I've heard, Greenplum is working
> on a custom MapReduce engine over their Greenplum database which will
> also support PL/R procedures.
>   
Using R on Hadoop might include some level of custom coding. If you are 
looking for an ad-hoc tool for data mining, then check Pig and Hive.

Enis
> Arijit
>
> Dr. Arijit Mukherjee
> Principal Member of Technical Staff, Level-II
> Connectiva Systems (I) Pvt. Ltd.
> J-2, Block GP, Sector V, Salt Lake
> Kolkata 700 091, India
> Phone: +91 (0)33 23577531/32 x 107
> http://www.connectivasystems.com
>
>
> -----Original Message-----
> From: Enis Soztutar [mailto:enis.soz.nutch@gmail.com] 
> Sent: Wednesday, September 24, 2008 2:57 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Questions about Hadoop
>
>
> Hi,
>
> Arijit Mukherjee wrote:
>   
>> Hi
>>
>> We've been thinking of using Hadoop for a decision making system which
>>     
>
>   
>> will analyze telecom-related data from various sources to take certain
>>     
>
>   
>> decisions. The data can be huge, of the order of terabytes, and can be
>>     
>
>   
>> stored as CSV files, which I understand will fit into Hadoop as Tom 
>> White mentions in the Rough Cut Guide that Hadoop is well suited for 
>> records. The question I want to ask is whether it is possible to 
>> perform statistical analysis on the data using Hadoop and MapReduce. 
>> If anyone has done such a thing, we'd be very interested to know about
>>     
>
>   
>> it. Is it also possible to create a workflow like functionality with 
>> MapReduce?
>>   
>>     
> Hadoop can handle TB data sizes, and statistical data analysis is one of
>
> the
> perfect things that fit into the mapreduce computation model. You can
> check what people are doing with Hadoop at 
> http://wiki.apache.org/hadoop/PoweredBy.
> I think the best way to see if your requirements can be met by 
> Hadoop/mapreduce is
> to read the Mapreduce paper by Dean et.al. Also you might be interested 
> in checking out
> Mahout, which is a subproject of Lucene. They are doing ML on top of 
> Hadoop.
>
> Hadoop is mostly suitable for batch jobs, however these jobs can be 
> chained together to
> form a workflow.  I will try to be more helpful if you could extend what
>
> you mean by workflow.
>
> Enis Soztutar
>
>   
>> Regards
>> Arijit
>>
>> Dr. Arijit Mukherjee
>> Principal Member of Technical Staff, Level-II
>> Connectiva Systems (I) Pvt. Ltd.
>> J-2, Block GP, Sector V, Salt Lake
>> Kolkata 700 091, India
>> Phone: +91 (0)33 23577531/32 x 107 http://www.connectivasystems.com
>>
>>
>>   
>>     
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com 
> Version: 8.0.169 / Virus Database: 270.7.1/1687 - Release Date:
> 9/23/2008 6:32 PM
>
>
>
>   

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message