flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew O'Neill <aone...@paytronix.com>
Subject Re: Flume 1.4 HDFS Sink Cannot Reconnect
Date Thu, 28 Aug 2014 15:23:20 GMT
Hello,

Did anyone have a chance to look at this issue?


Thanks,

[cid:15AE4044-2A22-4E71-959A-C79917667E52]<http://www.paytronix.com/>
Andrew O'Neill | Paytronix
74 Bridge Street, Suite 400
Newton, MA 02458
p. 617.649.3300 x256
[cid:4AEC1A82-7442-4A90-A364-8908C948D63F]<http://www.facebook.com/paytronix>  [cid:61B3BE7E-F1B1-431E-A761-65A67DF26D25]
<http://www.twitter.com/paytronix>   [cid:2BE733E2-B0C2-4AB3-9341-26B230DFEB80] <http://www.linkedin.com/company/paytronix-systems?trk=fc_badge>

From: Andrew Neil <aoneill@paytronix.com<mailto:aoneill@paytronix.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Tuesday, August 26, 2014 at 16:35 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect

Per Roshan’s request, I have filed a bug for this issue. For those interested, here is the
link to the issue:

https://issues.apache.org/jira/browse/FLUME-2451

Hopefully this will create some visibility on this problem.


Thanks,
Andrew

From: Roshan Naik <roshan@hortonworks.com<mailto:roshan@hortonworks.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Tuesday, August 26, 2014 at 16:11 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Re: Flume 1.4 HDFS Sink Cannot Reconnect

Please file a bug for this with the details provided in your email.


On Tue, Aug 26, 2014 at 9:44 AM, Gary Malouf <malouf.gary@gmail.com<mailto:malouf.gary@gmail.com>>
wrote:
+1 I've seen this same issue.


On Tue, Aug 26, 2014 at 12:33 PM, Andrew O'Neill <aoneill@paytronix.com<mailto:aoneill@paytronix.com>>
wrote:
Hello all,

My setup:
    - Flume 1.4
    - CDH 4.2.2 (2.0.0-cdh4.2.2)


I am testing a simple flume setup with a Sequence Generator Source, a File Channel, and an
HDFS Sink (see my flume.conf below). This configuration works as expected until I reboot the
cluster’s NameNode or until I restart the HDFS service on the cluster. At this point, it
appears that the Flume Agent cannot reconnect to HDFS and must be manually restarted. Since
this is not an uncommon occurrence  in our production cluster, it is important that Flume
is able to reconnect gracefully without any manual intervention.

So, how do we fix this HDFS reconnection issue?


Here is our flume.conf:

    appserver.sources = rawtext
    appserver.channels = testchannel
    appserver.sinks = test_sink

    appserver.sources.rawtext.type = seq
    appserver.sources.rawtext.channels = testchannel

    appserver.channels.testchannel.type = file
    appserver.channels.testchannel.capacity = 10000000
    appserver.channels.testchannel.minimumRequiredSpace = 214748364800
    appserver.channels.testchannel.checkpointDir = /Users/aoneill/Desktop/testchannel/checkpoint
    appserver.channels.testchannel.dataDirs = /Users/aoneill/Desktop/testchannel/data
    appserver.channels.testchannel.maxFileSize = 20000000

    appserver.sinks.test_sink.type = hdfs
    appserver.sinks.test_sink.channel = testchannel
    appserver.sinks.test_sink.hdfs.path = hdfs://cluster01:8020/user/aoneill/flumetest
    appserver.sinks.test_sink.hdfs.closeTries = 3
    appserver.sinks.test_sink.hdfs.filePrefix = events-
    appserver.sinks.test_sink.hdfs.fileSuffix = .avro
    appserver.sinks.test_sink.hdfs.fileType = DataStream
    appserver.sinks.test_sink.hdfs.writeFormat = Text
    appserver.sinks.test_sink.hdfs.inUsePrefix = inuse-
    appserver.sinks.test_sink.hdfs.inUseSuffix = .avro
    appserver.sinks.test_sink.hdfs.rollCount = 100000
    appserver.sinks.test_sink.hdfs.rollInterval = 30
    appserver.sinks.test_sink.hdfs.rollSize = 10485760


These are the two error message that the Flume Agent outputs constantly after the restart:

    2014-08-26 10:47:24,572 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:96)]
Unexpected error while checking replication factor
    java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.getNumCurrentReplicas(AbstractHDFSWriter.java:162)
        at org.apache.flume.sink.hdfs.AbstractHDFSWriter.isUnderReplicated(AbstractHDFSWriter.java:82)
        at org.apache.flume.sink.hdfs.BucketWriter.shouldRotate(BucketWriter.java:452)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:387)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:392)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:744)
    Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
        at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)

and

    2014-08-26 10:47:29,592 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)]
HDFS IO error
    java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:735)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
        at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1253)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:891)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:881)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:982)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:779)
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:448)


I can provide additional information if needed. Thank you very much for any insight you are
able to provide into this problem.


Best,
Andrew



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed
and may contain information that is confidential, privileged and exempt from disclosure under
applicable law. If the reader of this message is not the intended recipient, you are hereby
notified that any printing, copying, dissemination, distribution, disclosure or forwarding
of this communication is strictly prohibited. If you have received this communication in error,
please contact the sender immediately and delete it from your system. Thank You.

Mime
View raw message