hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: queues in haddop
Date Fri, 11 Jan 2013 15:06:00 GMT
He's got two different queues. 

1) queue in capacity scheduler so he can have a set or M/R tasks running in the background
to pull data off of...

2) a durable queue that receives the inbound json files to be processed. 

You can have a customer written listener that pulls data from the queue and puts them either
in HDFS or HBase, depending on the access patterns and the content of the files. 
Then you would write a M/R job that actually processes the data to be used by ancillary processes
not mentioned in the OP's question. 

This is why he asked about RabbitMQ which is one option, there are others like ActiveMQ or
something else....

On Jan 11, 2013, at 12:04 AM, Harsh J <harsh@cloudera.com> wrote:

> Your question in unclear: HDFS has no queues for ingesting data (it is
> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
> components have queues for processing data purposes.
> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <ouchwhisper@gmail.com> wrote:
>> Hello,
>> I have a hadoop cluster setup of 10 nodes and I an in need of implementing
>> queues in the cluster for receiving high volumes of data.
>> Please suggest what will be more efficient to use in the case of receiving
>> 24 Million Json files.. approx 5 KB each in every 24 hours :
>> 1. Using Capacity Scheduler
>> 2. Implementing RabbitMQ and receive data from them using Spring Integration
>> Data pipe lines.
>> I cannot afford to loose any of the JSON files received.
>> Thanking You,
>> --
>> Regards,
>> Ouch Whisper
>> 010101010101
> -- 
> Harsh J

View raw message