flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kathleen Ting <kathl...@apache.org>
Subject Re: HDFS sink leaves .tmp files
Date Thu, 13 Sep 2012 16:08:52 GMT
Chris, glad to hear and glad to be of help. Thanks for letting us know
that it worked.

Regards, Kathleen

On Thu, Sep 13, 2012 at 7:38 AM, Chris Neal <cwneal@gmail.com> wrote:
> Just to follow up, the .tmp file problem did go away using 1.3.0-SNAPSHOT on
> the HDFS sink agent.
>
> Thanks again Kathleen :)
>
>
> On Mon, Sep 10, 2012 at 8:38 PM, Chris Neal <cwneal@gmail.com> wrote:
>>
>> Thanks Kathleen!
>> I'll download that build tomorrow morning and give it a whirl.
>>
>> Chris
>>
>>
>> On Mon, Sep 10, 2012 at 5:09 PM, Kathleen Ting <kathleen@apache.org>
>> wrote:
>>>
>>> [Moving to cdh-user@cloudera.org |
>>> https://groups.google.com/a/cloudera.org/group/cdh-user/topics since
>>> this is getting to be CDH specific]
>>> bcc: user@flume.apache.org
>>>
>>> Chris,
>>>
>>> When the file has not been closed by the client, the file size may be
>>> shown as 0. The NameNode will not update the metadata about the file
>>> until the block is completed or the file handle is closed. Even if it
>>> updates at a block boundary, the size won't be accurate until the file
>>> is closed.
>>>
>>> The metadata takes some time to populate even though the files may
>>> contain data. The CDH4.1 version of Flume includes FLUME-1238, which
>>> will do auto-rolling of files and helps lower the period where these
>>> files appear to be 0 size.
>>>
>>> Since the CDH3u5 version of Flume is compatible with CDH3* Hadoop and
>>> the CDH4 Flume is compatible with CDH4* Hadoop, you can download the
>>> nightly build of flume-ng-1.2.0-cdh4.1.0 from
>>> http://nightly.cloudera.com/cdh4/cdh/4/
>>>
>>> Regards, Kathleen
>>>
>>> On Mon, Sep 10, 2012 at 1:08 PM, Bhaskar V. Karambelkar
>>> <bhaskarvk@gmail.com> wrote:
>>> > Don't know about RPM, but there's a 1.2.x tarball of the 1.2 build @
>>> > http://archive.cloudera.com/cdh/3/flume-ng-1.2.0-cdh3u5.tar.gz
>>> >
>>> >
>>> > On Mon, Sep 10, 2012 at 3:01 PM, Chris Neal <cwneal@gmail.com> wrote:
>>> >>
>>> >> Just checked, and from Cloudera, 1.1.0+121-1.cdh4.0.1.p0.1.el6 is
>>> >> still
>>> >> the latest from their yum repo.
>>> >>
>>> >>
>>> >> On Mon, Sep 10, 2012 at 1:59 PM, Chris Neal <cwneal@gmail.com>
wrote:
>>> >>>
>>> >>> I'm using a combination :)
>>> >>>
>>> >>> The application tier is 1.3.0-SNAPSHOT
>>> >>> The HDFS tier is CentOS, and I grabbed the latest (at the time)
from
>>> >>> the
>>> >>> CDH repo.  It's version is:  1.1.0+121-1.cdh4.0.1.p0.1.el6
>>> >>>
>>> >>> If the issue is on the HDFS sink side, that it could definitely
be in
>>> >>> my
>>> >>> version!
>>> >>> I'll check if Cloudera has a more recent version to update to.
>>> >>>
>>> >>> Thanks!
>>> >>> Chris
>>> >>>
>>> >>>
>>> >>> On Mon, Sep 10, 2012 at 12:37 PM, Kathleen Ting <kathleen@apache.org>
>>> >>> wrote:
>>> >>>>
>>> >>>> Chris, Eran, this appears to be FLUME-1238, which was fixed
in
>>> >>>> Flume-1.2.0. Can you let me know if you are using Flume-1.2.0?
>>> >>>>
>>> >>>> Thanks, Kathleen
>>> >>>>
>>> >>>> On Mon, Sep 10, 2012 at 8:21 AM, Chris Neal <cwneal@gmail.com>
>>> >>>> wrote:
>>> >>>> > Glad to know it's not just me :)
>>> >>>> >
>>> >>>> >
>>> >>>> > On Mon, Sep 10, 2012 at 10:16 AM, Eran Kutner <eran@gigya.com>
>>> >>>> > wrote:
>>> >>>> >>
>>> >>>> >> I have the same problem. I roll every 1 minute so I
have tons of
>>> >>>> >> those
>>> >>>> >> .tmp files.
>>> >>>> >>
>>> >>>> >> -eran
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Mon, Sep 10, 2012 at 6:02 PM, Chris Neal <cwneal@gmail.com>
>>> >>>> >> wrote:
>>> >>>> >>>
>>> >>>> >>> I'm still seeing this consistently every 24 hour
period.  Does
>>> >>>> >>> this
>>> >>>> >>> sound
>>> >>>> >>> like a configuration issue, an issue with the Exec
source, or an
>>> >>>> >>> issue with
>>> >>>> >>> the HDFS sink?
>>> >>>> >>>
>>> >>>> >>> Thanks!
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> On Wed, Aug 29, 2012 at 9:18 AM, Chris Neal <cwneal@gmail.com>
>>> >>>> >>> wrote:
>>> >>>> >>>>
>>> >>>> >>>> Hi all,
>>> >>>> >>>>
>>> >>>> >>>> I have an Exec Source running a tail -F on
a log4J-generated
>>> >>>> >>>> log
>>> >>>> >>>> file
>>> >>>> >>>> that gets rolled once a day.  It seems that
when log4J rolls
>>> >>>> >>>> the
>>> >>>> >>>> file to the
>>> >>>> >>>> new date, the hdfs sink ends up with a .tmp
file.  I haven't
>>> >>>> >>>> figured out if
>>> >>>> >>>> there is any data loss yet, but was curious
if this is expected
>>> >>>> >>>> behavior?
>>> >>>> >>>>
>>> >>>> >>>> Thanks for your time.
>>> >>>> >>>> Chris
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>
>>> >>>> >
>>> >>>
>>> >>>
>>> >>
>>> >
>>
>>
>

Mime
View raw message