hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: queues in haddop
Date Fri, 11 Jan 2013 13:03:29 GMT
There is also kafka. http://kafka.apache.org
"A high-throughput, distributed, publish-subscribe messaging system."

But it does not push into HDFS, you need to launch a job to pull data in.

Regards

Bertrand

On Fri, Jan 11, 2013 at 1:52 PM, Mirko Kämpf <mirko.kaempf@gmail.com> wrote:

> I would suggest to work with Flume, in order to clollect a certain number
> of files and store it to HDFS in larger chunk or write it directly to
> HBase, this allows random access later on (if need) otherwise HBase could
> be an overkill. You can collect data in an MySQL DB and than import
> regularly via Sqoop.
>
> Best
> Mirko
>
>
> "Every dat flow goes to Hadoop"
> citation from an unkown source
>
> 2013/1/11 Hemanth Yamijala <yhemanth@thoughtworks.com>
>
>> Queues in the capacity scheduler are logical data structures into which
>> MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
>> framework, according to some capacity constraints that can be defined for a
>> queue.
>>
>> So, given your use case, I don't think Capacity Scheduler is going to
>> directly help you (since you only spoke about data-in, and not processing)
>>
>> So, yes something like Flume or Scribe
>>
>> Thanks
>> Hemanth
>>
>> On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>>> Your question in unclear: HDFS has no queues for ingesting data (it is
>>> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
>>> components have queues for processing data purposes.
>>>
>>> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <ouchwhisper@gmail.com>
>>> wrote:
>>> > Hello,
>>> >
>>> > I have a hadoop cluster setup of 10 nodes and I an in need of
>>> implementing
>>> > queues in the cluster for receiving high volumes of data.
>>> > Please suggest what will be more efficient to use in the case of
>>> receiving
>>> > 24 Million Json files.. approx 5 KB each in every 24 hours :
>>> > 1. Using Capacity Scheduler
>>> > 2. Implementing RabbitMQ and receive data from them using Spring
>>> Integration
>>> > Data pipe lines.
>>> >
>>> > I cannot afford to loose any of the JSON files received.
>>> >
>>> > Thanking You,
>>> >
>>> > --
>>> > Regards,
>>> > Ouch Whisper
>>> > 010101010101
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>
>>
>


-- 
Bertrand Dechoux

Mime
View raw message