hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saumitra <saumitra.offic...@gmail.com>
Subject Re: Queue support from HDFS
Date Sat, 25 Jun 2011 20:05:20 GMT
Thanks for reply Jakob,

As far as I understand, Kafka's hadoop consumers is MR job where mappers 
read from shared queue from Kafka and dump data to HDFS, but they are 
not dynamically created as queue elements start bursting up.

Is there way so that new mappers are created when input queue of job 
grows or when input HDFS source get updated?

On Saturday 25 June 2011 01:01 AM, Jakob Homan wrote:
> Not directly, but you may wish to take a look at the Kafka project
> (http://sna-projects.com/kafka/), which we use as a queue and then
> bring the data periodically into HDFS via an MR job.  See this
> presentation: http://www.slideshare.net/ydn/hug-january-2011-kafka-presentation
> -Jakob
> On Fri, Jun 24, 2011 at 10:12 AM, Saumitra Shahapure
> <saumitra.official@gmail.com>  wrote:
>> Hi,
>> Is queue-like structure supported from HDFS where stream of data is
>> processed when it's generated?
>> Specifically, I will have stream of data coming; and data independent
>> operation needs to be applied to it (so only Map function, reducer is
>> identity).
>> I wish to distribute data among nodes using HDFS and start processing it as
>> it arrives, preferably in single MR job.
>> I agree that it can be done by starting new MR job for each batch of data,
>> but is starting many MR jobs frequently for small data chunks a good idea?
>> (Consider new batch arrives after every few sec and processing of one batch
>> takes few mins)
>> Thanks,
>> --
>> Saumitra S. Shahapure

Saumitra Shahapure

View raw message