hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Mahankali" <shravan.mahank...@catalytic.com>
Subject RE: how to use hadoop in real life?
Date Thu, 09 Jul 2009 05:05:12 GMT
Thanks for the information Ted.

Shravan Kumar. M 
Catalytic Software Ltd. [SEI-CMMI Level 5 Company]
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the system
administrator - netopshelpdesk@catalytic.com

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Wednesday, July 08, 2009 10:48 PM
To: common-user@hadoop.apache.org; shravan.mahankali@catalytic.com
Cc: Alex Loddengaard
Subject: Re: how to use hadoop in real life?

In general hadoop is simpler than you might imagine.

Yes, you need to create directories to store data.  This is much lighter
weight than creating a table in SQL.

But the key question is volume.  Hadoop makes some things easier and Pig
queries are generally easier to write than SQL (for programmers ... not for
those raised on SQL), but, overall, map-reduce programs really are more work
to write than SQL queries until you get to really large scale problems.

If your database has less than 10 million rows or so, I would recommend that
you consider doing all analysis in SQL augmented by procedural languages.
Only as your data goes beyond 100 million to a billion rows do the clear
advantages of map-reduce formulation become apparent.

On Tue, Jul 7, 2009 at 11:35 PM, Shravan Mahankali <
shravan.mahankali@catalytic.com> wrote:

> Use Case: We have a web app where user performs some actions, we have to
> track these actions and various parameters related to action initiator, we
> actually store this information in the database. But our manager has
> suggested evaluating Hadoop for this scenario, however, am not clear that
> every time I run a job in Hadoop I have to create a directory and how can
> track that later to read the data analyzed by Hadoop. Even though I drop
> user action information in Hadoop, I have to put this information in our
> database such that it knows the trend and responds for various of requests
> accordingy.

View raw message