Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78C14EFBB for ; Fri, 1 Feb 2013 15:44:53 +0000 (UTC) Received: (qmail 73927 invoked by uid 500); 1 Feb 2013 15:44:53 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 73517 invoked by uid 500); 1 Feb 2013 15:44:52 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 73491 invoked by uid 99); 1 Feb 2013 15:44:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2013 15:44:51 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.178] (HELO mail-lb0-f178.google.com) (209.85.217.178) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2013 15:44:44 +0000 Received: by mail-lb0-f178.google.com with SMTP id n1so4693387lba.37 for ; Fri, 01 Feb 2013 07:44:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=FVbgO5UOLEAR1J0ckPHei4YdQ1DxIgUWcA92icxr5Ic=; b=i8nci4LBLwUKpYTei4zZuFuCI9u5bjfyWsfIkwBwHgdgJvEwBhIFzsY6K30viE1JPU 07OZCIpKUdla/ncLCfrdduXnQscxJnWjvs91iLcPv43ONEbCH38aIrbtOPE76EUE6iVE VffacKn1WO/s8iccOaQj89TDK0YzVgzJsUW+3CvCVfYzQT7DEw5LOmStXvibjqcHcc5Q G7Tq7oZmmVbAuK4YVvd9+GyI5ccY27S+jDj4FcR/uCXzlVnpl5VghE4mE8m32uYT5lq3 8noMNek5y4eUxgNYJKSJcN9EITLJc4cXVhLdrxnKu0Twrp0gh6DDeA6Nlh/YpXK/XsWK ytgA== MIME-Version: 1.0 X-Received: by 10.112.13.162 with SMTP id i2mr4791344lbc.76.1359733462221; Fri, 01 Feb 2013 07:44:22 -0800 (PST) Received: by 10.114.16.229 with HTTP; Fri, 1 Feb 2013 07:44:22 -0800 (PST) In-Reply-To: References: Date: Fri, 1 Feb 2013 10:44:22 -0500 Message-ID: Subject: Re: SpoolDir marks item as completed, when sink fails From: Tzur Turkenitz To: user@flume.apache.org Content-Type: multipart/alternative; boundary=f46d0401fa5d01457204d4ab9e7c X-Gm-Message-State: ALoCoQmqzmVKtjvLSFCnqrwakKe325y+426RjTTjoe4CAtcw4svbvZaN66/+5Sk6eKysLtKjLKVw X-Virus-Checked: Checked by ClamAV on apache.org --f46d0401fa5d01457204d4ab9e7c Content-Type: text/plain; charset=ISO-8859-1 Mike, so when the data is committed to the channel, and the channel is of type "File" then when the agent will be restarted the data will continue to flow onto the sink? And if only 20% of the data passed onto the sink before it crashed then a "Replay" will be done to resend the whole data? Just trying to grasp the basics.... On Fri, Feb 1, 2013 at 4:56 AM, Mike Percy wrote: > Tzur, that is expected, because the data is committed by the source onto > the channel. Sources and sinks are decoupled, they only interact via the > channel, which buffers the data and serves to mitigate impedance mismatches. > > > > On Thu, Jan 31, 2013 at 2:35 PM, Tzur Turkenitz wrote: > >> Hello all, >> >> I am running HDP 1.2 and Flume 1.3. I have a flume setup which includes a >> (1) - Load Balancer that uses SpoolDir adapter and sends events to Avro >> sinks >> (2) - Agents which consume the data using an avro source and writing to >> hdfs. >> >> During testing I noticed that there's a dissonance between the Load >> Balancer and the Consumers... >> When a Load Balancer process a file it marks it as COMPLETED, even if the >> consumer has crashed while writing to HDFS. >> >> A preferred behavior would be the Load Balancer to wait until the >> consumer commits its transaction and reports it as successful before the >> file is marked as COMPLETED. This does not allow me to verify which files >> has been loaded successfully if an agent has crashed and recovery is in >> process. >> >> Have I miss-configured my Agents or this is actually the desired behavior? >> >> >> Kind Regards, >> Tzur >> > > -- Regards, Tzur Turkenitz Vision.BI http://www.vision.bi/ "*Facts are stubborn things, but statistics are more pliable*" -Mark Twain --f46d0401fa5d01457204d4ab9e7c Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Mike, so when the data is committed to the channel, and th= e channel is of type "File" then when the agent will be restarted= the data will continue to flow onto the sink?
And if only 20% of the d= ata passed onto the sink before it crashed then a "Replay" will b= e done to resend the whole data?

Just trying to grasp the basics....

<= div>


On Fri, Feb 1, 2013 at 4:56 AM, Mike Percy &= lt;mpercy@apache.org= > wrote:
Tzur, that is expected, bec= ause the data is committed by the source onto the channel. Sources and sink= s are decoupled, they only interact via the channel, which buffers the data= and serves to mitigate impedance mismatches.



On Thu, Jan 31, 2013 at 2:35 PM,= Tzur Turkenitz <tzurt@vision.bi> wrote:
Hello all,

I am running HDP 1.2 and Flume 1.3. I have a flume setup which includes = a
(1) - =A0Load Balancer that uses SpoolDir adapter and sends events to = Avro sinks
(2) - Agents which consume the data using an avro sour= ce and writing to hdfs.

During testing I noticed that there's a dissonance = between the Load Balancer and the Consumers...
When a Load Balanc= er process a file it marks it as COMPLETED, even if the consumer has crashe= d while writing to HDFS.

A=A0preferred=A0behavior would be the Load Balancer to = wait until the consumer commits its transaction and reports it as=A0success= ful=A0before the file is marked as COMPLETED. This does not allow me to ver= ify which files has been loaded successfully if an agent has crashed and re= covery is in process.

Have I=A0miss-configured=A0my Agents or this is actuall= y the desired behavior?


Kind R= egards,
Tzur




--
=
Regards,
Tzur Turkenitz
V= ision.BI=A0

=
"Facts are stubborn things, but statistics are more pliable<= /font>"
-Mark Twain
--f46d0401fa5d01457204d4ab9e7c--