flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: File Channel Exception "Failed to obtain lock for writing to the log.Try increasing the log write timeout value"
Date Thu, 27 Feb 2014 19:18:39 GMT
See https://issues.apache.org/jira/browse/FLUME-2307  

This jira removed the write-timeout, but that only makes sure that there is no transaction
in limbo. The real reason like I said is slow IO. Try using provisioned IO for better throughput.
 


Thanks,
Hari


On Thursday, February 27, 2014 at 10:48 AM, Mangtani, Kushal wrote:

> Hari,
>   
> Thanks for the prompt reply. The current file channel’s  write-timeout = 30 sec .EBS
drive current  capacity = 200 GB . The rate of writes is 60 events/min; where each event is
approx. 40 KB.
>   
> I am thinking of increase file channel write-timeout to 60 sec. What do you suggest?
> Also,one strange thing I noticed all the flume-collectors  also get the same exception.However,
all have a separate ebs drive. Any inputs?
>   
> Thanks,
> Kushal Mangtani
>   
> From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]  
> Sent: Thursday, February 27, 2014 10:35 AM
> To: user@flume.apache.org (mailto:user@flume.apache.org)
> Subject: Re: File Channel Exception "Failed to obtain lock for writing to the log.Try
increasing the log write timeout value"  
>   
> For now, increase the file channel’s write-timeout parameter to around 30 or so (basically
file channel is timing out while writing to disk). But the basic problem you are seeing is
that your EBS instance is very slow and IO is taking too long. You either need to increase
your EBS IO capacity, or reduce the rate or writes.
>  
>   
>  
>   
> Thanks,
>  
> Hari
>  
>   
>  
>  
> On Thursday, February 27, 2014 at 10:28 AM, Mangtani, Kushal wrote:
> >  
> >   
> >  
> >  
> >   
> >  
> >  
> > From: Mangtani, Kushal  
> > Sent: Wednesday, February 26, 2014 4:51 PM
> > To: 'user@flume.apache.org (mailto:user@flume.apache.org)'; 'user-subscribe@flume.apache.org
(mailto:user-subscribe@flume.apache.org)'
> > Cc: Rangnekar, Rohit; 'dev@flume.apache.org (mailto:dev@flume.apache.org)'
> > Subject: File Channel Exception "Failed to obtain lock for writing to the log.Try
increasing the log write timeout value"
> >  
> >  
> >  
> >  
> >   
> >  
> >  
> > Hi,
> >  
> >  
> >   
> >  
> >  
> > I'm using Flume-Ng 1.4 cdh4.4 Tarball for collecting aggregated logs.
> >  
> >  
> > I am running a 2 tier(agent,collector) Flume Configuration with custom plugins.
There are approximately 20 agents (receiving data) and 6 collector flume (writing to HDFS)
machines all running independenly. However, I have been facing some File Channel Exceptions
on the collector side. The agent appears to be working fine.
> >  
> >  
> >   
> >  
> >  
> >  
> >  Error  stacktrace:
> >  
> >  
> >                              org.apache.flume.ChannelException: Failed to obtain
lock for writing to the log. Try increasing the log write timeout value. [channel=c2]
> >  
> >  
> >                              at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doRollback(FileChannel.java:621)
> >  
> >  
> >                              at org.apache.flume.channel.BasicTransactionSemantics.rollback(BasicTransactionSemantics.java:168)
> >  
> >  
> >                              at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:421)
> >  
> >  
> >                              at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
> >  
> >  
> >                              at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
> >  
> >  
> >                              …..
> >  
> >  
> >                              And I keep on getting the same error
> >  
> >  
> >   
> >  
> >  
> >                              P.S :This same exception is repated in most of the
flume collector machines.But, not at the same duration. There is usually a difference of a
couple of hours or more.
> >  
> >  
> >   
> >  
> >  
> >  
> >  
> > 1.  HDFS sinks are written in  the Amazon EC2 cloud instance.
> >  
> >  
> >  
> > 2. datadir and checkpoint dir of file channel in all flume collector instances are
mounted to a separate hadoop ebs drive .This makes sure that two separate collectors do not
overlap their log and checkpoint dir. There is a symbolic link i.e /usr/lib/flume-ng/datasource
à /hadoop/ebs/mnt-1
> >  
> >  
> >  
> > 3. The Flume works fine for a couple of days and all the agent,collector are initialized
properly without exceptions.
> >  
> >  
> >  
> >   
> >  
> >  
> > Questions:
> >  
> >  
> > Exception “Failed to obtain lock for writing to the log. Try increasing the log
write timeout value . [channel=c2]” . According to the documentation, such an exception
occurs only if two processes are acceesing the same file/directory. However, each channel
is configured separately so No two channels should access the same dir. Hence, this exception
does not indicates anything. Please correct me, if im wrong.  
> >  
> >  
> > Also, HDFS.CallTimeout – indicates calling HDFS for open,write operations. If
no response within a duration, it timeouts. And , if its timeouts; it closes the File. Please
correct me, if im wrong.  Also, if there is a way to specify the number of retries before
it closes the file?
> >  
> >  
> >   
> >  
> >  
> > Your inputs/suggestions will be thoroughly appreciated.  
> >  
> >  
> >   
> >  
> >  
> >   
> >  
> >  
> > Regards
> >  
> >  
> > Kushal Mangtani
> >  
> >  
> > Software Engineer
> >  
> >  
> >   
> >  
> >  
> >  
> >  
> >  
> >  
>  
>   
>  
>  
>  
>  



Mime
View raw message