hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Re: Hadoop also applicable in a web app environment?
Date Tue, 05 Aug 2008 18:32:00 GMT
I am a newbie also, so my answer is not an expert user's by any means.
 That said:

This is not what the MR is designed for...

If you have a reporting tool for example, which takes a database a
very long time to answer - such a long time that you can't expect a
user to hang around waiting for the HTTP response - you might use
hadoop to churn through the data and produce the report, with a
response to the user "your data is being processes, please check back
this_URL soon"

It is not designed as the thing that answers real time synchronous
requests though (e.g. users clicking on links), nor to handle high
traffic load - for that you need more servers, and a load balancer
like you say - and scaling out your DB to have multiple read only

Consider a search engine - yahoo are crawling all the web sites, and
using MR to process the data to create indexes of the words on pages.
But when you search on Yahoo as a user, it is not a MR job that is
running to provide the answers.  Here you could say MR is playing the
role of generating the index "offline" which is then loaded into
something that can answer the query immediately.  You might consider
lucene or SOLR or something for that... (SOLR especially I would say)

You might find http://highscalability.com/ interesting...



On Tue, Aug 5, 2008 at 8:11 PM, Mork0075 <mork0075@googlemail.com> wrote:
> Hello,
> i just discovered the Hadoop project and it looks really interesting to me.
> As i can see at the moment, Hadoop is really useful for data intensive
> computations. Is there a Hadoop scenario for scaling web applications too?
> Normally web applications are not that computation heavy. The need of
> scaling them, arises from increasing users, which perform (every user in his
> session) simple operations like querying some data from the database.
> So distributing this scenario, a Hadoop job would be to "map" the requests
> to a certain server in the cluster and "reduce" it. But this is what load
> balancers normally do, this doenst solve the scalabilty problem so far.
> So my question: is there a Hadoop scenario for "non computation heavy but
> heavy load" web applications?
> Thanks a lot

View raw message