hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Mahankali" <shravan.mahank...@catalytic.com>
Subject RE: how to use hadoop in real life?
Date Thu, 09 Jul 2009 09:27:10 GMT
Hi Group,

I have data to be analyzed and I would like to dump this data to Hadoop from
machine.X where as Hadoop is running from machine.Y, after dumping this data
to data I would like to initiate a job, get this data analyzed and get the
output information back to machine.X

I would like to do all this programmatically. Am going through Hadoop API
for this same purpose. I remember last day Alex was saying to install Hadoop
in machine.X, but I was not sure why to do that? 

I simple write a Java program including Hadoop-core jar, I was planning to
use "FsUrlStreamHandlerFactory" to connect to Hadoop in machine.Y and then
use "org.apache.hadoop.fs.shell" to copy data to Hadoop machine and initiate
the job and get the results.

Please advice.

Thank You,
Shravan Kumar. M 
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-----------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator - netopshelpdesk@catalytic.com

-----Original Message-----
From: Shravan Mahankali [mailto:shravan.mahankali@catalytic.com] 
Sent: Thursday, July 09, 2009 10:35 AM
To: common-user@hadoop.apache.org
Cc: 'Alex Loddengaard'
Subject: RE: how to use hadoop in real life?

Thanks for the information Ted.

Regards,
Shravan Kumar. M 
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
-----------------------------
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator - netopshelpdesk@catalytic.com

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Wednesday, July 08, 2009 10:48 PM
To: common-user@hadoop.apache.org; shravan.mahankali@catalytic.com
Cc: Alex Loddengaard
Subject: Re: how to use hadoop in real life?

In general hadoop is simpler than you might imagine.

Yes, you need to create directories to store data.  This is much lighter
weight than creating a table in SQL.

But the key question is volume.  Hadoop makes some things easier and Pig
queries are generally easier to write than SQL (for programmers ... not for
those raised on SQL), but, overall, map-reduce programs really are more work
to write than SQL queries until you get to really large scale problems.

If your database has less than 10 million rows or so, I would recommend that
you consider doing all analysis in SQL augmented by procedural languages.
Only as your data goes beyond 100 million to a billion rows do the clear
advantages of map-reduce formulation become apparent.

On Tue, Jul 7, 2009 at 11:35 PM, Shravan Mahankali <
shravan.mahankali@catalytic.com> wrote:

> Use Case: We have a web app where user performs some actions, we have to
> track these actions and various parameters related to action initiator, we
> actually store this information in the database. But our manager has
> suggested evaluating Hadoop for this scenario, however, am not clear that
> every time I run a job in Hadoop I have to create a directory and how can
I
> track that later to read the data analyzed by Hadoop. Even though I drop
> user action information in Hadoop, I have to put this information in our
> database such that it knows the trend and responds for various of requests
> accordingy.
>


Mime
View raw message