Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BF4F3EF38 for ; Fri, 11 Jan 2013 10:30:55 +0000 (UTC) Received: (qmail 95340 invoked by uid 500); 11 Jan 2013 10:30:51 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 95217 invoked by uid 500); 11 Jan 2013 10:30:50 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 95199 invoked by uid 99); 11 Jan 2013 10:30:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 10:30:50 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hemanty@thoughtworks.com designates 64.18.0.141 as permitted sender) Received: from [64.18.0.141] (HELO exprod5og101.obsmtp.com) (64.18.0.141) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Jan 2013 10:30:42 +0000 Received: from mail-oa0-f71.google.com ([209.85.219.71]) (using TLSv1) by exprod5ob101.postini.com ([64.18.4.12]) with SMTP ID DSNKUO/pvesYbjkzHcaK1ykYG/9AtyW0gzVq@postini.com; Fri, 11 Jan 2013 02:30:22 PST Received: by mail-oa0-f71.google.com with SMTP id n12so7714321oag.2 for ; Fri, 11 Jan 2013 02:30:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=iza3MHpDNpsg+DUpB2WWcSRHwbat67UBqnU/f6LKCgU=; b=QoCJr0B1rL+1x70/cSBs2vFUPaHYZVS9vt6RYdQbBSQpsC6U6GQKyvjPBTLEkepOPJ wOw3bqhXwCPV1dybQ9bVWEpdQjh6MbYwWkjnU8rSlcefr/jbAUZEx5I9QuZ80cqzZ+QW CCY3SngQ2wf4MD0AWfD2a9YW/5p2Hb9VdjXIiegcJxAbB/9JI2be/slWzRyEJfO+xj2B QDGS3u8pK4GryQ/4T0/aaynKNUU9WZW8r+NG7qdrPDn8jczgxE+aRp4TxHMvIKQdcS3A +E5ZZtSOpjWKRYZ4d0fZY3DzEVD6onCmpbnkM/OnuAjDcqyTi5JekagYSX4Q2LUx7uKM 157Q== X-Received: by 10.182.150.72 with SMTP id ug8mr54107301obb.1.1357900220735; Fri, 11 Jan 2013 02:30:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.182.150.72 with SMTP id ug8mr54107298obb.1.1357900220656; Fri, 11 Jan 2013 02:30:20 -0800 (PST) Received: by 10.76.1.18 with HTTP; Fri, 11 Jan 2013 02:30:20 -0800 (PST) In-Reply-To: References: Date: Fri, 11 Jan 2013 16:00:20 +0530 Message-ID: Subject: Re: queues in haddop From: Hemanth Yamijala To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=f46d044469ad4aea4604d300c863 X-Gm-Message-State: ALoCoQn4/z+bk2+4RvT7wPMmVs34g3k/KdMcvTaVGa1m6yuDOa+1ZKd1gpW3kUNuAEj+6cY5Plng6qtdqLNg/nZ905l0AZCxrVtdeDIsloT0L1Fq9ksBRHKXlvjK53boyCTyp5zw3/PVAqxeUUTDoQZy1Laf6kYV3g== X-Virus-Checked: Checked by ClamAV on apache.org --f46d044469ad4aea4604d300c863 Content-Type: text/plain; charset=ISO-8859-1 Queues in the capacity scheduler are logical data structures into which MapReduce jobs are placed to be picked up by the JobTracker / Scheduler framework, according to some capacity constraints that can be defined for a queue. So, given your use case, I don't think Capacity Scheduler is going to directly help you (since you only spoke about data-in, and not processing) So, yes something like Flume or Scribe Thanks Hemanth On Fri, Jan 11, 2013 at 11:34 AM, Harsh J wrote: > Your question in unclear: HDFS has no queues for ingesting data (it is > a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN > components have queues for processing data purposes. > > On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper > wrote: > > Hello, > > > > I have a hadoop cluster setup of 10 nodes and I an in need of > implementing > > queues in the cluster for receiving high volumes of data. > > Please suggest what will be more efficient to use in the case of > receiving > > 24 Million Json files.. approx 5 KB each in every 24 hours : > > 1. Using Capacity Scheduler > > 2. Implementing RabbitMQ and receive data from them using Spring > Integration > > Data pipe lines. > > > > I cannot afford to loose any of the JSON files received. > > > > Thanking You, > > > > -- > > Regards, > > Ouch Whisper > > 010101010101 > > > > -- > Harsh J > --f46d044469ad4aea4604d300c863 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Queues in the capacity scheduler are logical data structures into which Map= Reduce jobs are placed to be picked up by the JobTracker / Scheduler framew= ork, according to some capacity constraints that can be defined for a queue= .

So, given your use case, I don't think Capacity Schedule= r is going to directly help you (since you only spoke about data-in, and no= t processing)

So, yes something like Flume or Scri= be

Thanks
Hemanth

On Fri, Jan 11, 2013 at 11:34 AM, Harsh J = <harsh@cloudera.com> wrote:
Your question in unclear: HDFS has no queues for ingesting data (it is
a simple, distributed FileSystem). The Hadoop M/R and Hadoop YARN
components have queues for processing data purposes.

On Fri, Jan 11, 2013 at 8:42 AM, Panshul Whisper <ouchwhisper@gmail.com> wrote: > Hello,
>
> I have a hadoop cluster setup of 10 nodes and I an in need of implemen= ting
> queues in the cluster for receiving high volumes of data.
> Please suggest what will be more efficient to use in the case of recei= ving
> 24 Million Json files.. approx 5 KB each in every 24 hours :
> 1. Using Capacity Scheduler
> 2. Implementing RabbitMQ and receive data from them using Spring Integ= ration
> Data pipe lines.
>
> I cannot afford to loose any of the JSON files received.
>
> Thanking You,
>
> --
> Regards,
> Ouch Whisper
> 010101010101



--
Harsh J

--f46d044469ad4aea4604d300c863--