hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Yamijala <yhema...@thoughtworks.com>
Subject Re: queues in haddop
Date Fri, 11 Jan 2013 10:30:20 GMT
Queues in the capacity scheduler are logical data structures into which
MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
framework, according to some capacity constraints that can be defined for a
queue.

So, given your use case, I don't think Capacity Scheduler is going to
directly help you (since you only spoke about data-in, and not processing)

So, yes something like Flume or Scribe

Thanks
Hemanth

On Fri, Jan 11, 2013 at 11:34 AM, Harsh J <harsh@cloudera.com> wrote:

> Your question in unclear: HDFS has no queues for ingesting data (it is
> a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
> components have queues for processing data purposes.
>
> On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <ouchwhisper@gmail.com>
> wrote:
> > Hello,
> >
> > I have a hadoop cluster setup of 10 nodes and I an in need of
> implementing
> > queues in the cluster for receiving high volumes of data.
> > Please suggest what will be more efficient to use in the case of
> receiving
> > 24 Million Json files.. approx 5 KB each in every 24 hours :
> > 1. Using Capacity Scheduler
> > 2. Implementing RabbitMQ and receive data from them using Spring
> Integration
> > Data pipe lines.
> >
> > I cannot afford to loose any of the JSON files received.
> >
> > Thanking You,
> >
> > --
> > Regards,
> > Ouch Whisper
> > 010101010101
>
>
>
> --
> Harsh J
>

Mime
View raw message