hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Holsman (Lists)" <li...@holsman.net>
Subject Re: realtime hadoop
Date Tue, 24 Jun 2008 12:10:24 GMT
Matt Kent wrote:
> We use Hadoop in a similar manner, to process batches of data in
> real-time every few minutes. However, we do substantial amounts of
> processing on that data, so we use Hadoop to distribute our computation.
> Unless you have a significant amount of work to be done, I wouldn't
> recommend using Hadoop because it's not worth the overhead of launching
> the jobs and moving the data around.

Thanks Matt.

we are boiling the ocean with the data so to speak.. so thats cool.
we are also looking at supplementing the m/r jobs with data coming in 
from spread to get the 'instant' analysis parts of our feedback systems.

> Matt
> On Tue, 2008-06-24 at 13:34 +1000, Ian Holsman (Lists) wrote:
>> Interesting.
>> we are planning on using hadoop to provide 'near' real time log
>> analysis. we plan on having files close every 5 minutes (1 per log
>> machine, so 80 files every 5 minutes) and then have a m/r to merge it
>> into a single file that will get processed by other jobs later on.
>> do you think this will namespace will explode?
>> I wasn't thinking of clouddb.. it might be an interesting alternative
>> once it is a bit more stable.
>> regards
>> Ian
>> Stefan Groschupf wrote:
>>> Hadoop might be the wrong technology for you.
>>> Map Reduce is a batch processing mechanism. Also HDFS might be critical
>>> since to access your data you need to close the file - means you might
>>> have many small file, a situation where hdfs is not very strong
>>> (namespace is hold in memory).
>>> Hbase might be an interesting tool for you, also zookeeper if you want
>>> to do something home grown...
>>> On Jun 23, 2008, at 11:31 PM, Vadim Zaliva wrote:
>>>> Hi!
>>>> I am considering using Hadoop for (almost) realime data processing. I
>>>> have data coming every second and I would like to use hadoop cluster
>>>> to process
>>>> it as fast as possible. I need to be able to maintain some guaranteed
>>>> max. processing time, for example under 3 minutes.
>>>> Does anybody have experience with using Hadoop in such manner? I will
>>>> appreciate if you can share your experience or give me pointers
>>>> to some articles or pages on the subject.
>>>> Vadim
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> 101tec Inc.
>>> Menlo Park, California, USA
>>> http://www.101tec.com

View raw message