flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Chavez <pcha...@verticalsearchworks.com>
Subject RE: Take list full error after 1.3 upgrade
Date Fri, 01 Mar 2013 00:11:04 GMT
Did the default channel transaction change from 1.2 to 1.3? It used to be 1 million events
default, and still looks like it according to metrics:
 
CHANNEL.fc_WebLogs:  
{ 

*		EventPutSuccessCount: "0",
*		ChannelFillPercentage: "99.994",
*		Type: "CHANNEL",
*		StopTime: "0",
*		EventPutAttemptCount: "0",
*		ChannelSize: "999940",
*		StartTime: "1362096361779",
*		EventTakeSuccessCount: "0",
*		ChannelCapacity: "1000000",
*		EventTakeAttemptCount: "22022"


________________________________

From: Hari Shreedharan [mailto:hshreedharan@cloudera.com] 
Sent: Thursday, February 28, 2013 4:07 PM
To: user@flume.apache.org
Subject: Re: Take list full error after 1.3 upgrade


You need to increase the transactionCapacity of the channel to at least the batchSize of the
HDFS sink. In your case, it is 1000 for the channel transaction capacity and your hdfs batch
size is 10000. 

-- 
Hari Shreedharan


On Thursday, February 28, 2013 at 4:00 PM, Paul Chavez wrote:

		I have a 2-tier flume setup, with 4 agents feeding into 2 'collector' agents that write
to HDFS.
	One of the data flows is hung up after an upgrade and restart with the following error:

	3:54:13.497 PM ERROR org.apache.flume.sink.hdfs.HDFSEventSink process failed
	org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity 1000 full,
consider committing more frequently, increasing capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)

	3:54:13.498 PM ERROR org.apache.flume.SinkRunner Unable to deliver event. Exception follows.
	org.apache.flume.EventDeliveryException: org.apache.flume.ChannelException: Take list for
FileBackedTransaction, capacity 1000 full, consider committing more frequently, increasing
capacity, or increasing thread count. [channel=fc_WebLogs]
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:461)
	at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
	at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
	at java.lang.Thread.run(Thread.java:662)
	Caused by: org.apache.flume.ChannelException: Take list for FileBackedTransaction, capacity
1000 full, consider committing more frequently, increasing capacity, or increasing thread
count. [channel=fc_WebLogs]
	at org.apache.flume.channel.file.FileChannel$FileBackedTransaction.doTake(FileChannel.java:481)
	at org.apache.flume.channel.BasicTransactionSemantics.take(BasicTransactionSemantics.java:113)
	at org.apache.flume.channel.BasicChannelSemantics.take(BasicChannelSemantics.java:95)
	at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:386)
	... 3 more
	The relevant part of the config is here:
	tier2.sinks.hdfs_WebLogs.type = hdfs
	tier2.sinks.hdfs_WebLogs.channel = fc_WebLogs
	tier2.sinks.hdfs_WebLogs.hdfs.path = /flume/WebLogs/%Y%m%d/%H%M
	tier2.sinks.hdfs_WebLogs.hdfs.round = true
	tier2.sinks.hdfs_WebLogs.hdfs.roundValue = 15
	tier2.sinks.hdfs_WebLogs.hdfs.roundUnit = minute
	tier2.sinks.hdfs_WebLogs.hdfs.rollSize = 67108864
	tier2.sinks.hdfs_WebLogs.hdfs.rollCount = 0
	tier2.sinks.hdfs_WebLogs.hdfs.rollInterval = 30
	tier2.sinks.hdfs_WebLogs.hdfs.batchSize = 10000
	tier2.sinks.hdfs_WebLogs.hdfs.fileType = DataStream
	tier2.sinks.hdfs_WebLogs.hdfs.writeFormat = Text

	The channel is full, and the metrics page shows many take attempts with no successes. I've
been in situations before where the channel is full (usually due to lease issues on HDFS files)
but never had this issue, usually just an agent restart gets it going again.

	Any help appreciated..

	Thanks,
	Paul Chavez
	


Mime
View raw message