hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cazen Lee <cazen....@gmail.com>
Subject Re: [HDFS-inotify] "IOException: The client is stopped" after reading file
Date Sun, 01 May 2016 12:02:20 GMT
Hello Chris. This is Cazen
I've been receiving a lot of pain in this issue but it was resolved by your
kind advice.
It works well. And now I understand what's happened in behind.
Thank you very much. I saved a lot of time.
Have a good day :)

2016년 4월 30일 (토) 오후 1:18, Chris Nauroth <cnauroth@hortonworks.com>님이
작성:

> Hello Cazen,
>
> This looks to me like this is suffering from an unintended side effect of
> closing the FileSystem object.  Hadoop internally caches instances of the
> FileSystem class, and the same instance can be returned to multiple call
> sites.  Even after one call site closes it, it's possible that other call
> sites still hold a reference to that same FileSystem instance.  Closing the
> FileSystem instance makes it unusable.
>
> HdfsAdmin#getInotifyEventStream is likely using the same FileSystem
> instance that your own FileSystem.get call returns.  By closing it (using
> try-with-resources), that FileSystem instance is made invalid for the
> subsequent calls to retrieve inotify events.
>
> The FileSystem cache is a fairly common source of confusion.  However, its
> current behavior is considered by design.  For reasons of
> backwards-incompatibility, we can't easily change its behavior to help with
> confusing situations like this.  (Sorry!)
>
> A few suggestions to try:
>
> 1. Just don't close the FileSystem.  Even if you don't close it
> explicitly, it will be closed at process teardown via a shutdown hook.
> This definitely looks wrong from a resource management perspective, but a
> lot of applications work this way.
>
> 2. Call FileSystem#newInstance instead of FileSystem#get.  The newInstance
> method is guaranteed to return an instance unique to that call site, not a
> shared instance potentially in use by other call sites.  If you use
> newInstance, then you must guarantee it gets closed to avoid a leak with a
> long-term impact.
>
> 3. You can disable the FileSystem cache for specific file system types by
> editing core-site.xml and setting property fs.<file system
> type>.impl.disable.cache to true, e.g. fs.hdfs.impl.disable.cache.  In
> general, disabling the cache is not desirable, because the performance
> benefits of the cache are noticeable.  Sometimes this is a helpful
> workaround for specific applications though.
>
> --Chris Nauroth
>
> From: Cazen Lee <cazen.lee@gmail.com>
> Date: Thursday, April 28, 2016 at 5:53 PM
> To: "user@hadoop.apache.org" <user@hadoop.apache.org>
> Subject: [HDFS-inotify] "IOException: The client is stopped" after
> reading file
>
> Good day this is Cazen
> Could I kindly ask about something weird situation when reading file in
> hdfs with inotify polling
>
> - Env : MacOS, EMR, Linux(standalone) - same problem
> - Version : Hadoop 2.7.2
>
> 1. I would like to write down a code that read file under particular
> location when it created(with using inotify)
>     So I modify sample code based on "hdfs-inotify-example" in github
>
> https://github.com/onefoursix/hdfs-inotify-example/blob/master/src/main/java/com/onefoursix/HdfsINotifyExample.java
>
> 2. I've changed code with read and print line to console when it renamed
>
> https://github.com/onefoursix/hdfs-inotify-example/commit/82485881c5da85a46dd1741c2d8420c7c4e81f93
>
> case RENAME:
>     Event.RenameEvent renameEvent = (Event.RenameEvent) event;
>     Configuration conf = new Configuration();
>     conf.set("fs.defaultFS", defaultFS);
>     System.out.println(renameEvent.getDstPath() + " " +
> inputPath.getPath());
>     if (renameEvent.getDstPath().startsWith(inputPath.getPath())) {
>         //Try to read file
>         try (FileSystem fs = FileSystem.get(conf)) {
>             Path filePath = new Path(defaultFS + renameEvent.getDstPath());
>             BufferedReader br = new BufferedReader(new
> InputStreamReader(fs.open(filePath)));
>             String line;
>             line = br.readLine();
>             while (line != null) {
>                 System.out.println(line);
>                 line = br.readLine();
>             }
>             br.close();
>         }
>     }
>
> 3. It works. But I encountered IOException in next eventStream.take()
> after file read. It doesn't happen if I do not read file on hdfs.
> -------------CODE-------------
> DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
> EventBatch batch = eventStream.take();
>
> -------------LOG-------------
> Cazens-MacBook-Pro:hdfs-inotify-example Cazen$ java -jar
> target/hdfs-inotify-example-uber.jar hdfs://localhost:8032/cazen/
> lastReadTxid = 0
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.util.Shell).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> TxId = 3134
> event type = CREATE
>   path = /cazen/test2.txt._COPYING_
>   owner = Cazen
>   ctime = 1461850245559
> TxId = 3138
> event type = CLOSE
> TxId = 3139
> event type = RENAME
> /cazen/test2.txt /cazen/
> --------------------File Start
> Input File Text Sample LOL
> --------------------File END
> Exception in thread "main" java.io.IOException: The client is stopped
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1507)
> at org.apache.hadoop.ipc.Client.call(Client.java:1451)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy9.getEditsFromTxid(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1511)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy10.getEditsFromTxid(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:111)
> at
> org.apache.hadoop.hdfs.DFSInotifyEventInputStream.take(DFSInotifyEventInputStream.java:224)
> at com.onefoursix.HdfsINotifyExample.main(HdfsINotifyExample.java:40)
>
> There is possibility that I may have written the wrong code. If anyone
> already know about this situation, could I ask the reason?
> Any advice would be appreciated.
> Thank you Have a good day :)
>
> --
> Cazen.lee@gmail.com
> Cazen.lee@samsung.com
> http://www.cazen.co.kr
>

Mime
View raw message