Return-Path: X-Original-To: apmail-flume-user-archive@www.apache.org Delivered-To: apmail-flume-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7C544DE02 for ; Tue, 20 Nov 2012 23:25:12 +0000 (UTC) Received: (qmail 3165 invoked by uid 500); 20 Nov 2012 23:25:12 -0000 Delivered-To: apmail-flume-user-archive@flume.apache.org Received: (qmail 3103 invoked by uid 500); 20 Nov 2012 23:25:12 -0000 Mailing-List: contact user-help@flume.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flume.apache.org Delivered-To: mailing list user@flume.apache.org Received: (qmail 3095 invoked by uid 99); 20 Nov 2012 23:25:12 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2012 23:25:12 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mohitanchlia@gmail.com designates 209.85.223.179 as permitted sender) Received: from [209.85.223.179] (HELO mail-ie0-f179.google.com) (209.85.223.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2012 23:25:05 +0000 Received: by mail-ie0-f179.google.com with SMTP id 9so450510iec.38 for ; Tue, 20 Nov 2012 15:24:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=WkKdzrLXWVLCQ+27Mog2grVxemqz/X6J+y8bI60I56A=; b=tl97nomXtqs2WdG1DRq/Ebd017B3WXPtcsi943jF+w+OZnkDJI9W6G1qkjv1WJ+IzY poyD6PgHNB+3q7J7h+ygOzsI7Xt6Gngp8IxQpXrQZ7ci75iCGZqew7yoBMbpJe2b0Zlq 4MKOqgQsoBGPkkRW61Nk1sl9dKOtB+y20mzqb51i64PBPU5UOCBrQGACVPpLoBP40H/f OvGR8hj7u/MJHDOHFy20sYm7Gq0BHeS4snHwu01lEvMT8c/O1xjwSn2rRuZZEw0IqLap /JGlR2FzabZO97c6pacu+LVpuisEPcmIiWUhN3nfVapBiQ/tbmvWtLEVqBSZc5LyOuro lA2A== MIME-Version: 1.0 Received: by 10.50.214.66 with SMTP id ny2mr11746865igc.21.1353453884882; Tue, 20 Nov 2012 15:24:44 -0800 (PST) Received: by 10.64.81.113 with HTTP; Tue, 20 Nov 2012 15:24:44 -0800 (PST) In-Reply-To: References: <50A5E9DE.90509@cyberagent.co.jp> Date: Tue, 20 Nov 2012 15:24:44 -0800 Message-ID: Subject: Re: .tmp in hdfs sink From: Mohit Anchlia To: user@flume.apache.org Content-Type: multipart/alternative; boundary=14dae9341199073eac04cef58a6f X-Virus-Checked: Checked by ClamAV on apache.org --14dae9341199073eac04cef58a6f Content-Type: text/plain; charset=ISO-8859-1 that's awesome! On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy wrote: > Mohit, > No problem, but Juhani did all the work. :) > > The behavior is that you can configure an HDFS sink to close a file if it > hasn't gotten any writes in some time. After it's been idle for 5 minutes > or something, it gets closed. If you get a "late" event that goes to the > same path after the file is closed, it will just create a new file in the > same path as usual. > > Regards, > Mike > > > On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland wrote: > >> We are currently voting on a 1.3.0 RC on the dev@ list: >> >> http://s.apache.org/OQ0W >> >> You don't have to be a committer to vote! :) >> >> Brock >> >> On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia >> wrote: >> > Thanks a lot!! Now with this what should be the expected behaviour? >> After >> > file is closed a new file is created for writes that come after closing >> the >> > file? >> > >> > Thanks again for committing this change. Do you know when 1.3.0 is out? >> I am >> > currently using the snapshot version of 1.3.0 >> > >> > On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy wrote: >> >> >> >> Mohit, >> >> FLUME-1660 is now committed and it will be in 1.3.0. In the case where >> you >> >> are using 1.2.0, I suggest running with hdfs.rollInterval set so the >> files >> >> will roll normally. >> >> >> >> Regards, >> >> Mike >> >> >> >> >> >> On Thu, Nov 15, 2012 at 11:23 PM, Juhani Connolly >> >> wrote: >> >>> >> >>> I am actually working on a patch for exactly this, refer to FLUME-1660 >> >>> >> >>> The patch is on review board right now, I fixed a corner case issue >> that >> >>> came up with unit testing, but the implementation is not really to my >> >>> satisfaction. If you are interested please have a look and add your >> opinion. >> >>> >> >>> https://issues.apache.org/jira/browse/FLUME-1660 >> >>> https://reviews.apache.org/r/7659/ >> >>> >> >>> >> >>> On 11/16/2012 01:16 PM, Mohit Anchlia wrote: >> >>> >> >>> Another question I had was about rollover. What's the best way to >> >>> rollover files in reasonable timeframe? For instance our path is >> YY/MM/DD/HH >> >>> so every hour there is new file and the -1 hr is just sitting with >> .tmp and >> >>> it takes sometimes even hour before .tmp is closed and renamed to >> .snappy. >> >>> In this situation is there a way to tell flume to rollover files >> sooner >> >>> based on some idle time limit? >> >>> >> >>> On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia < >> mohitanchlia@gmail.com> >> >>> wrote: >> >>>> >> >>>> Thanks Mike it makes sense. Anyway I can help? >> >>>> >> >>>> >> >>>> On Thu, Nov 15, 2012 at 11:54 AM, Mike Percy >> wrote: >> >>>>> >> >>>>> Hi Mohit, this is a complicated issue. I've filed >> >>>>> https://issues.apache.org/jira/browse/FLUME-1714 to track it. >> >>>>> >> >>>>> In short, it would require a non-trivial amount of work to implement >> >>>>> this, and it would need to be done carefully. I agree that it would >> be >> >>>>> better if Flume handled this case more gracefully than it does >> today. Today, >> >>>>> Flume assumes that you have some job that would go and clean up the >> .tmp >> >>>>> files as needed, and that you understand that they could be >> partially >> >>>>> written if a crash occurred. >> >>>>> >> >>>>> Regards, >> >>>>> Mike >> >>>>> >> >>>>> >> >>>>> On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia < >> mohitanchlia@gmail.com> >> >>>>> wrote: >> >>>>>> >> >>>>>> What we are seeing is that if flume gets killed either because of >> >>>>>> server failure or other reasons, it keeps around the .tmp file. >> Sometimes >> >>>>>> for whatever reasons .tmp file is not readable. Is there a way to >> rollover >> >>>>>> .tmp file more gracefully? >> >>>>> >> >>>>> >> >>>> >> >>> >> >>> >> >> >> > >> >> >> >> -- >> Apache MRUnit - Unit testing MapReduce - >> http://incubator.apache.org/mrunit/ >> > > --14dae9341199073eac04cef58a6f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable that's awesome!

On Tue, Nov 20, 2012 at 3:11 PM, Mike Percy <mp= ercy@apache.org> wrote:
Mohit,=20
No problem, but Juhani did all the work. :)=20

The behavior is that you can configure an HDFS sink to close a file if= it hasn't gotten any writes in some time. After it's been idle for= 5 minutes or something, it gets closed. If you get a "late" even= t that goes to the same path after the file is closed, it will just create = a new file in the same path as usual.

Regards,
Mike=20


On Tue, Nov 20, 2012 at 12:56 PM, Brock Noland <= span dir=3D"ltr"><brock@cloudera.com> wrote:
We are currently voting on a 1.3.0 RC= on the dev@ list:

http://s.apache.org/OQ0W

You don't have to be a committer to vote! :)

Brock

On Tue, Nov 20, 2012 at 2:53 PM, Mohit Anchlia <mohitanchlia@gmail.com>= wrote:
> Thanks a lot!! Now with this what should be the expected be= haviour? After
> file is closed a new file is created for writes that come after closin= g the
> file?
>
> Thanks again for committing this change= . Do you know when 1.3.0 is out? I am
> currently using the snapshot = version of 1.3.0
>
> On Tue, Nov 20, 2012 at 11:16 AM, Mike Percy <mpercy@apache.org> wrote:<= br>>>
>> Mohit,
>> FLUME-1660 is now committed and = it will be in 1.3.0. In the case where you
>> are using 1.2.0, I suggest running with hdfs.rollInterval set so t= he files
>> will roll normally.
>>
>> Regards,>> Mike
>>
>>
>> On Thu, Nov 15, 2012 at= 11:23 PM, Juhani Connolly
>> <juhani_connolly@cyberagent.co.jp> wrote:
>>>
= >>> I am actually working on a patch for exactly this, refer to FL= UME-1660
>>>
>>> The patch is on review board right now, I fixe= d a corner case issue that
>>> came up with unit testing, but t= he implementation is not really to my
>>> satisfaction. If you = are interested please have a look and add your opinion.
>>>
>>> https://issues.apache.org/jira/browse/FLUM= E-1660
>>> https://reviews.apache.org/r/7659/
>>>
>>>
>>> On 11/16/2012 01:16 PM, Mohit = Anchlia wrote:
>>>
>>> Another question I had was a= bout rollover. What's the best way to
>>> rollover files in= reasonable timeframe? For instance our path is YY/MM/DD/HH
>>> so every hour there is new file and the -1 hr is just sitting = with .tmp and
>>> it takes sometimes even hour before .tmp is c= losed and renamed to .snappy.
>>> In this situation is there a = way to tell flume to rollover files sooner
>>> based on some idle time limit?
>>>
>>>= On Thu, Nov 15, 2012 at 8:14 PM, Mohit Anchlia <mohitanchlia@gmail.com>
>= >> wrote:
>>>>
>>>> Thanks Mike it makes sense. Anyway I c= an help?
>>>>
>>>>
>>>> On Thu= , Nov 15, 2012 at 11:54 AM, Mike Percy <mpercy@apache.org> wrote:
>>>>>
>>>>> Hi Mohit, this is a complicate= d issue. I've filed
>>>>> https://issues.apache.o= rg/jira/browse/FLUME-1714 to track it.
>>>>>
>>>>> In short, it would require a n= on-trivial amount of work to implement
>>>>> this, and it= would need to be done carefully. I agree that it would be
>>>&= gt;> better if Flume handled this case more gracefully than it does toda= y. Today,
>>>>> Flume assumes that you have some job that would go and= clean up the .tmp
>>>>> files as needed, and that you un= derstand that they could be partially
>>>>> written if a = crash occurred.
>>>>>
>>>>> Regards,
>>>>&g= t; Mike
>>>>>
>>>>>
>>>>= > On Sun, Nov 11, 2012 at 8:32 AM, Mohit Anchlia <mohitanchlia@gmail.com>
>>>>> wrote:
>>>>>>
>>>>= >> What we are seeing is that if flume gets killed either because of<= br>>>>>>> server failure or other reasons, it keeps aroun= d the .tmp file. Sometimes
>>>>>> for whatever reasons .tmp file is not readable. Is= there a way to rollover
>>>>>> .tmp file more gracefu= lly?
>>>>>
>>>>>
>>>> >>>
>>>
>>
>



--
Apache MRUnit - Unit testing MapReduce= - http:/= /incubator.apache.org/mrunit/

<= /div>
--14dae9341199073eac04cef58a6f--