flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Soo Kim <philipj...@gmail.com>
Subject Cause and solution for org.apache.flume.EventDeliveryException: NettyAvroRpcClient: Handshake timed out after 20000ms
Date Thu, 04 Oct 2012 02:57:33 GMT
Hi,

I'm using Flume NG (1.2.0), and having some trouble with timeout exception.

I have 2 servers. A log file is appended at server A. And server B has HBase.
So I installed Flume NG to server A as a source, and to server B as a sink.

flume.conf for server A is:
src_agent.sources = tail
src_agent.channels = memoryChannel
src_agent.sinks = avroSink

src_agent.sources.tail.type = exec
src_agent.sources.tail.command = tail -F /home/www/crash_report/log/crash.log
src_agent.sources.tail.channels = memoryChannel

src_agent.sinks.avroSink.channel = memoryChannel
src_agent.sinks.avroSink.type = avro
src_agent.sinks.avroSink.hostname = serverb.hostname.com
src_agent.sinks.avroSink.port = 10001

src_agent.channels.memoryChannel.type = memory
src_agent.channels.memoryChannel.capacity = 10000

And flume.conf for server B is:
dst_agent.sources = avro
dst_agent.channels = memoryChannel
dst_agent.sinks = hbaseSink

dst_agent.sources.avro.type = avro
dst_agent.sources.avro.bind = 0.0.0.0
dst_agent.sources.avro.port = 10001
dst_agent.sources.avro.channels = memoryChannel

dst_agent.sinks.hbaseSink.channel = memoryChannel
dst_agent.sinks.hbaseSink.type = org.apache.flume.sink.hbase.HBaseSink
dst_agent.sinks.hbaseSink.table = crash_report_flume
dst_agent.sinks.hbaseSink.columnFamily = f
dst_agent.sinks.hbaseSink.column = foo
dst_agent.sinks.hbaseSink.serializer =
org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
dst_agent.sinks.hbaseSink.serializer.payloadColumn = pcol
dst_agent.sinks.hbaseSink.serializer.incrementColumn = icol

dst_agent.channels.memoryChannel.type = memory
dst_agent.channels.memoryChannel.capacity = 10000

Flume startup script for server A is:
$FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
$FLUME_HOME/conf/flume-hbase-src.conf --name src_agent
-Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &

Script for server B:
$FLUME_HOME/bin/flume-ng agent --conf $FLUME_HOME/conf --conf-file
$FLUME_HOME/conf/flume-hbase-dst.conf --name dst_agent
-Dflume.root.logger=INFO,console 1>>flume.log 2>>flume.log &

All worked fine after startup. Logs were collected to HBase as
expected. But after a couple of hours I found something went wrong.

Here's the flume error logs in server A:
2012-10-02 20:28:59,641
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
{ host: serverb.hostname.com, port: 10001 }: Failed to send batch
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:236)
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:309)
        ... 3 more
Caused by: org.apache.flume.EventDeliveryException: NettyAvroRpcClient
{ host: serverb.hostname.com, port: 10001 }: Handshake timed out after
20000ms
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:280)
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:224)
        ... 4 more
Caused by: java.util.concurrent.TimeoutException
        at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:258)
        at java.util.concurrent.FutureTask.get(FutureTask.java:119)
        at org.apache.flume.api.NettyAvroRpcClient.appendBatch(NettyAvroRpcClient.java:278)
        ... 5 more

There were lots of timeout exceptions, and the following exception
occured sparsely:
2012-10-04 09:44:06,408
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to send events
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:325)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:636)
Caused by: org.apache.flume.FlumeException: NettyAvroRpcClient { host:
host: serverb.hostname.com, port: 10001 }: RPC connection error
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:117)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:93)
        at org.apache.flume.api.NettyAvroRpcClient.configure(NettyAvroRpcClient.java:507)
        at org.apache.flume.api.RpcClientFactory.getInstance(RpcClientFactory.java:88)
        at org.apache.flume.sink.AvroSink.createConnection(AvroSink.java:182)
        at org.apache.flume.sink.AvroSink.verifyConnection(AvroSink.java:222)
        at org.apache.flume.sink.AvroSink.process(AvroSink.java:282)
        ... 3 more
Caused by: java.io.IOException: Error connecting to host:
serverb.hostname.com/192.168.1.2:10001
        at org.apache.avro.ipc.NettyTransceiver.getChannel(NettyTransceiver.java:249)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:198)
        at org.apache.avro.ipc.NettyTransceiver.<init>(NettyTransceiver.java:147)
        at org.apache.flume.api.NettyAvroRpcClient.connect(NettyAvroRpcClient.java:106)
        ... 9 more
Caused by: java.nio.channels.ClosedChannelException
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$1.operationComplete(NioClientSocketPipelineSink.java:157)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListener(DefaultChannelFuture.java:381)
        at org.jboss.netty.channel.DefaultChannelFuture.notifyListeners(DefaultChannelFuture.java:367)
        at org.jboss.netty.channel.DefaultChannelFuture.setSuccess(DefaultChannelFuture.java:316)
        at org.jboss.netty.channel.AbstractChannel$ChannelCloseFuture.setClosed(AbstractChannel.java:351)
        at org.jboss.netty.channel.AbstractChannel.setClosed(AbstractChannel.java:188)
        at org.jboss.netty.channel.socket.nio.NioSocketChannel.setClosed(NioSocketChannel.java:146)
        at org.jboss.netty.channel.socket.nio.NioWorker.close(NioWorker.java:592)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.close(NioClientSocketPipelineSink.java:415)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.processConnectTimeout(NioClientSocketPipelineSink.java:379)
        at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink$Boss.run(NioClientSocketPipelineSink.java:299)
        at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:44)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        ... 1 more

I'd like to know what's the root cause for those errors, and how I can
solve them.

FYI here's error logs in server B:
2012-10-02 23:13:41,479
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:221)]
Failed to commit transaction.Transaction rolled back.
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
        at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
        at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
2012-10-02 23:13:41,480
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:228)]
Failed to commit transaction.Transaction rolled back.
org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
        at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
        at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
2012-10-02 23:13:41,480
(SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:160)]
Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: Failed to commit
transaction.Transaction rolled back.
        at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:230)
        at org.apache.flume.sink.hbase.HBaseSink.process(HBaseSink.java:200)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
Failed 100 actions: servers with issues: 192.168.1.3:60020,
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1424)
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:624)
        at org.apache.flume.sink.hbase.HBaseSink.putEventsAndCommit(HBaseSink.java:207)
        ... 4 more

-- 
Thanks,
Soo Philip Kim
Email : philipjkim@gmail.com
Phone : +82-10-7222-6078

Mime
View raw message