hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Using Hadoop for near real-time processing of log data
Date Wed, 25 Feb 2009 19:09:09 GMT
On Wed, Feb 25, 2009 at 1:13 PM, Mikhail Yakshin
<greycat.na.kor@gmail.com> wrote:
> Hi,
>
>> Is anyone using Hadoop as more of a near/almost real-time processing
>> of log data for their systems to aggregate stats, etc?
>
> We do, although "near realtime" is pretty relative subject and your
> mileage may vary. For example, startups / shutdowns of Hadoop jobs are
> pretty expensive and it could take anything from 5-10 seconds up to
> several minutes to get the job started and almost same thing goes for
> job finalization. Generally, if your "near realtime" would tolerate
> 3-4-5 minutes lag, it's possible to use Hadoop.
>
> --
> WBR, Mikhail Yakshin
>

I was thinking about this. Assuming your datasets are small would
running a local jobtracker or even running the MinimMR cluster from
the test case be an interesting way to run small jobs confided to one
CPU?

Mime
View raw message