flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Flume 1.4 HDFS Sink Cannot Reconnect
Date Tue, 26 Aug 2014 20:11:04 GMT
Please file a bug for this with the details provided in your email.


On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <malouf.gary@gmail.com> wrote:

> +1 I've seen this same issue.
>
>
> On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <aoneill@paytronix.com>
> wrote:
>
>> Hello all,
>>
>> My setup:
>>     - Flume 1.4
>>     - CDH 4.2.2 (2.0.0-cdh4.2.2)
>>
>>
>> I am testing a simple flume setup with a Sequence Generator Source, a
>> File Channel, and an HDFS Sink (see my flume.conf below). This
>> configuration works as expected until I reboot the cluster’s NameNode or
>> until I restart the HDFS service on the cluster. At this point, it appears
>> that the Flume Agent cannot reconnect to HDFS and must be manually
>> restarted. Since this is not an uncommon occurrence  in our production
>> cluster, it is important that Flume is able to reconnect gracefully without
>> any manual intervention.
>>
>> So, how do we fix this HDFS reconnection issue?
>>
>>
>> Here is our flume.conf:
>>
>>     appserver.sources = rawtext
>>     appserver.channels = testchannel
>>     appserver.sinks = test_sink
>>
>>     appserver.sources.rawtext.type = seq
>>     appserver.sources.rawtext.channels = testchannel
>>
>>     appserver.channels.testchannel.type = file
>>     appserver.channels.testchannel.capacity = 10000000
>>     appserver.channels.testchannel.minimumRequiredSpace = 214748364800
>>     appserver.channels.testchannel.checkpointDir =
>> /Users/aoneill/Desktop/testchannel/checkpoint
>>     appserver.channels.testchannel.dataDirs =
>> /Users/aoneill/Desktop/testchannel/data
>>     appserver.channels.testchannel.maxFileSize = 20000000
>>
>>     appserver.sinks.test_sink.type = hdfs
>>     appserver.sinks.test_sink.channel = testchannel
>>     appserver.sinks.test_sink.hdfs.path =
>> hdfs://cluster01:8020/user/aoneill/flumetest
>>     appserver.sinks.test_sink.hdfs.closeTries = 3
>>     appserver.sinks.test_sink.hdfs.filePrefix = events-
>>     appserver.sinks.test_sink.hdfs.fileSuffix = .avro
>>     appserver.sinks.test_sink.hdfs.fileType = DataStream
>>     appserver.sinks.test_sink.hdfs.writeFormat = Text
>>     appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
>>     appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
>>     appserver.sinks.test_sink.hdfs.rollCount = 100000
>>     appserver.sinks.test_sink.hdfs.rollInterval = 30
>>     appserver.sinks.test_sink.hdfs.rollSize = 10485760
>>
>>
>> These are the two error message that the Flume Agent outputs constantly
>> after the restart:
>>
>>     2014-08-26 10:47:24,572
>> (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR -
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)]
>> Unexpected error while checking replication factor
>>     java.lang.reflect.InvocationTargetException
>>         at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:606)
>>         at
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
>>         at
>> org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
>>         at
>> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
>>         at
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
>>         at
>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>         at
>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>         at java.lang.Thread.run(Thread.java:744)
>>     Caused by: java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>>
>> and
>>
>>     2014-08-26 10:47:29,592
>> (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN -
>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)]
>> HDFS IO error
>>     java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at
>> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
>>         at
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)
>>
>>
>> I can provide additional information if needed. Thank you very much for
>> any insight you are able to provide into this problem.
>>
>>
>> Best,
>>
>> Andrew
>>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message