Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0BF4DD692 for ; Thu, 13 Sep 2012 16:08:55 +0000 (UTC) Received: (qmail 82773 invoked by uid 500); 13 Sep 2012 16:08:54 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 82738 invoked by uid 500); 13 Sep 2012 16:08:54 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 82726 invoked by uid 99); 13 Sep 2012 16:08:54 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 16:08:54 +0000 Received: from localhost (HELO mail-wi0-f173.google.com) (127.0.0.1) (smtp-auth username kathleen, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Sep 2012 16:08:54 +0000 Received: by wibhm6 with SMTP id hm6so5625321wib.8 for ; Thu, 13 Sep 2012 09:08:52 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.139.15 with SMTP id b15mr1500685wej.169.1347552532823; Thu, 13 Sep 2012 09:08:52 -0700 (PDT) Received: by 10.223.201.10 with HTTP; Thu, 13 Sep 2012 09:08:52 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Sep 2012 09:08:52 -0700 Message-ID: Subject: Re: HDFS sink leaves .tmp files From: Kathleen Ting To: user@flume.apache.org Content-Type: text/plain; charset=ISO-8859-1 Chris, glad to hear and glad to be of help. Thanks for letting us know that it worked. Regards, Kathleen On Thu, Sep 13, 2012 at 7:38 AM, Chris Neal wrote: > Just to follow up, the .tmp file problem did go away using 1.3.0-SNAPSHOT on > the HDFS sink agent. > > Thanks again Kathleen :) > > > On Mon, Sep 10, 2012 at 8:38 PM, Chris Neal wrote: >> >> Thanks Kathleen! >> I'll download that build tomorrow morning and give it a whirl. >> >> Chris >> >> >> On Mon, Sep 10, 2012 at 5:09 PM, Kathleen Ting >> wrote: >>> >>> [Moving to cdh-user@cloudera.org | >>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics since >>> this is getting to be CDH specific] >>> bcc: user@flume.apache.org >>> >>> Chris, >>> >>> When the file has not been closed by the client, the file size may be >>> shown as 0. The NameNode will not update the metadata about the file >>> until the block is completed or the file handle is closed. Even if it >>> updates at a block boundary, the size won't be accurate until the file >>> is closed. >>> >>> The metadata takes some time to populate even though the files may >>> contain data. The CDH4.1 version of Flume includes FLUME-1238, which >>> will do auto-rolling of files and helps lower the period where these >>> files appear to be 0 size. >>> >>> Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop and >>> the CDH4 Flume is compatible with CDH4* Hadoop, you can download the >>> nightly build of flume-ng-1.2.0-cdh4.1.0 from >>> http://nightly.cloudera.com/cdh4/cdh/4/ >>> >>> Regards, Kathleen >>> >>> On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar >>> wrote: >>> > Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build @ >>> > http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz >>> > >>> > >>> > On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal wrote: >>> >> >>> >> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is >>> >> still >>> >> the latest from their yum repo. >>> >> >>> >> >>> >> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal wrote: >>> >>> >>> >>> I'm using a combination :) >>> >>> >>> >>> The application tier is 1.3.0-SNAPSHOT >>> >>> The HDFS tier is CentOS, and I grabbed the latest (at the time) from >>> >>> the >>> >>> CDH repo. It's version is: 1.1.0+121-1.cdh4.0.1.p0.1.el6 >>> >>> >>> >>> If the issue is on the HDFS sink side, that it could definitely be in >>> >>> my >>> >>> version! >>> >>> I'll check if Cloudera has a more recent version to update to. >>> >>> >>> >>> Thanks! >>> >>> Chris >>> >>> >>> >>> >>> >>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting >>> >>> wrote: >>> >>>> >>> >>>> Chris, Eran, this appears to be FLUME-1238, which was fixed in >>> >>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0? >>> >>>> >>> >>>> Thanks, Kathleen >>> >>>> >>> >>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal >>> >>>> wrote: >>> >>>> > Glad to know it's not just me :) >>> >>>> > >>> >>>> > >>> >>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner >>> >>>> > wrote: >>> >>>> >> >>> >>>> >> I have the same problem. I roll every 1 minute so I have tons of >>> >>>> >> those >>> >>>> >> .tmp files. >>> >>>> >> >>> >>>> >> -eran >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal >>> >>>> >> wrote: >>> >>>> >>> >>> >>>> >>> I'm still seeing this consistently every 24 hour period. Does >>> >>>> >>> this >>> >>>> >>> sound >>> >>>> >>> like a configuration issue, an issue with the Exec source, or an >>> >>>> >>> issue with >>> >>>> >>> the HDFS sink? >>> >>>> >>> >>> >>>> >>> Thanks! >>> >>>> >>> >>> >>>> >>> >>> >>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal >>> >>>> >>> wrote: >>> >>>> >>>> >>> >>>> >>>> Hi all, >>> >>>> >>>> >>> >>>> >>>> I have an Exec Source running a tail -F on a log4J-generated >>> >>>> >>>> log >>> >>>> >>>> file >>> >>>> >>>> that gets rolled once a day. It seems that when log4J rolls >>> >>>> >>>> the >>> >>>> >>>> file to the >>> >>>> >>>> new date, the hdfs sink ends up with a .tmp file. I haven't >>> >>>> >>>> figured out if >>> >>>> >>>> there is any data loss yet, but was curious if this is expected >>> >>>> >>>> behavior? >>> >>>> >>>> >>> >>>> >>>> Thanks for your time. >>> >>>> >>>> Chris >>> >>>> >>> >>> >>>> >>> >>> >>>> >> >>> >>>> > >>> >>> >>> >>> >>> >> >>> > >> >> >