Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BC356EB6B for ; Fri, 15 Mar 2013 02:42:53 +0000 (UTC) Received: (qmail 26403 invoked by uid 500); 15 Mar 2013 02:42:53 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 26276 invoked by uid 500); 15 Mar 2013 02:42:52 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 26247 invoked by uid 99); 15 Mar 2013 02:42:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 02:42:52 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of malouf.gary@gmail.com designates 209.85.219.48 as permitted sender) Received: from [209.85.219.48] (HELO mail-oa0-f48.google.com) (209.85.219.48) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Mar 2013 02:42:45 +0000 Received: by mail-oa0-f48.google.com with SMTP id j1so2893715oag.7 for ; Thu, 14 Mar 2013 19:42:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=3m+xCh4DG5sBtDMSFnizUFicmJleJ5fX7A8eGPVjVbQ=; b=ep2WNWoHxix/mcLPauFuA0NmvRPXLyeS5P5jTSVmJ5rKnhxxcTGGsDxYSdEAXgp8X3 Z01P/zhxbwqOChbkJQJ9UbHTq8c/NmLzjmnGCSRsHR30jwgVI14FIpsCPM2SssuWngQ3 86sDzX8q9vUub9K5YpAshrHEOpHZzIvfOsf91ORXKQD1hB10CyH28qIpzG5PWrN3sUwN 52W/4Gs1/uLFiBtneKfgZPCho23yJRFFxdm63o0S1VkDF4b569nThzgMMMNodDALV76h xFKHNY22xk1gXo0gdwebPa41A+P11KFx7tI6qWQNLt0kQ22pELs1mDlTua0pZaTA1Rx7 X8Yw== MIME-Version: 1.0 X-Received: by 10.182.86.196 with SMTP id r4mr2196503obz.56.1363315344346; Thu, 14 Mar 2013 19:42:24 -0700 (PDT) Received: by 10.60.121.40 with HTTP; Thu, 14 Mar 2013 19:42:24 -0700 (PDT) In-Reply-To: References: Date: Thu, 14 Mar 2013 22:42:24 -0400 Message-ID: Subject: Re: Writing to HDFS from multiple HDFS agents (separate machines) From: Gary Malouf To: user Content-Type: multipart/alternative; boundary=f46d044481b5d17b3004d7ed9608 X-Virus-Checked: Checked by ClamAV on apache.org --f46d044481b5d17b3004d7ed9608 Content-Type: text/plain; charset=ISO-8859-1 Thanks for the pointer Mike. Any thoughts on how you choose how many consumers per channel? I will eventually find the optimal number via perf testing, but it would be good to start with a nice default. Thanks, Gary On Thu, Mar 14, 2013 at 10:30 PM, Gary Malouf wrote: > Paul, I interpreted the host property to be for identifying the host that > an event originates from rather than the host of the sink which writes the > event to HDFS? Is my understanding correct? > > > What happens if I am using the NettyAvroRpcClient to feed events from a > different server round robin style to two hdfs writing agents; should I > then NOT set the host property on client side and rely on the interceptor? > > > On Thu, Mar 14, 2013 at 6:34 PM, Gary Malouf wrote: > >> To be clear, I am referring to the segregating of data from different >> flume sinks as opposed to the original source of the event. Having said >> that, it sounds like your approach is the easiest. >> >> -Gary >> >> >> On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf wrote: >> >>> Hi guys, >>> >>> I'm new to flume (hdfs for that metter), using the version packaged with >>> CDH4 (1.3.0) and was wondering how others are maintaining different file >>> names being written to per HDFS sink. >>> >>> My initial thought is to create a separate sub-directory in hdfs for >>> each sink - though I feel like the better way is to somehow prefix each >>> file with a unique sink id. Are there any patterns that others are >>> following for this? >>> >>> -Gary >>> >> >> > --f46d044481b5d17b3004d7ed9608 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for the pointer Mike.=A0 Any thoughts on how y= ou choose how many consumers per channel?=A0 I will eventually find the opt= imal number via perf testing, but it would be good to start with a nice def= ault.

Thanks,

Gary


<= div class=3D"gmail_quote">On Thu, Mar 14, 2013 at 10:30 PM, Gary Malouf <malouf.gary@gmail.com> wrote:
Paul, I interpreted th= e host property to be for identifying the host that an event originates fro= m rather than the host of the sink which writes the event to HDFS?=A0 Is my= understanding correct?


What happens if I am using the NettyAvroRpcClient to feed events= from a different server round robin style to two hdfs writing agents; shou= ld I then NOT set the host property on client side and rely on the intercep= tor?


On Thu, Mar 14, 2013 at 6:34 PM, Gar= y Malouf <malouf.gary@gmail.com> wrote:
To be clear, I am refe= rring to the segregating of data from different flume sinks as opposed to t= he original source of the event.=A0 Having said that, it sounds like your a= pproach is the easiest.

-Gary


<= div class=3D"gmail_quote">On Thu, Mar 14, 2013 at 5:54 PM, Gary Malouf <malouf.gary@gmail.com> wrote:
Hi guys,
=
I'm new to flume (hdfs for that metter), using the version packaged with CDH4 (1.3.0) and was wondering how=20 others are maintaining different file names being written to per HDFS=20 sink.

My initial thought is to create a separate sub-directory in=20 hdfs for each sink - though I feel like the better way is to somehow=20 prefix each file with a unique sink id.=A0 Are there any patterns that=20 others are following for this?

=
-Gary



--f46d044481b5d17b3004d7ed9608--