flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brock Noland <br...@cloudera.com>
Subject Re: HDFS file rolling behaviour
Date Tue, 18 Sep 2012 14:37:01 GMT
If you have not increased the OS number of open files limit, you should.
The default limit of 1024 is too low for nearly every modern application.

In regards to the rolling, can you paste you config and describe in more
detail the unexpected behavior you are seeing?

Brock

On Tue, Sep 18, 2012 at 7:08 AM, Jagadish Bihani <
jagadish.bihani@pubmatic.com> wrote:

>  Hi
>
> Does anybody know about  the issue mentioned in the following mail?
>
>
> Update: I have seen following behaviour now even for time based rolling.
> By time based rolling I would expect: That single file should be created
> after x seconds.
> But in my case some n files are created after every x seconds.
> Is it something to do with HDFS batch size?
>
> Regards,
> Jagadish
>
>
>
> -------- Original Message --------  Subject: HDFS file rolling behaviour  Date:
> Thu, 13 Sep 2012 14:26:56 +0530  From: Jagadish Bihani
> <jagadish.bihani@pubmatic.com> <jagadish.bihani@pubmatic.com>  To:
> user@flume.apache.org
>
> Hi
>
> I use two flume agents:
> 1. flume_agent 1 which is a source with (exec source -file channel -avro
> sink)
> 2. flume_agent 2 which is a dest with (avro source -file channel - HDFS
> sink)
>
> I have observed that for HDFS sink with rolling by *file size/number of
> events* it
> creates a lot of simultaneous connections to source's avro sink. But
> while rolling by *time interval* it does it *one by one* i.e. opens 1
> HDFS file write to
> it and then close it.  I expect for other rolling intervals too same thing
> should happen
> i.e.  first open file and if x number of events are written to it then
> roll it and open another
> and so on.
>
> In my case my data ingestion works fine with "time" based rolling but in
> other
> cases due to the above behaviour I get exceptions like:
> -- too many open files
> -- timeout related exceptions for file channel and few more exceptions.
>
> I can increase the values of the parameters giving exceptions but I dont
> know what
> adverse effects it may have.
>
> Can somebody throw some light on the rolling based on file size/number of
> events ?
>
> Regards,
> Jagadish
>
>
>
>


-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

Mime
View raw message