From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: [HDFS-inotify] "IOException: The client is stopped" after reading file
Date Sat, 30 Apr 2016 04:18:39 GMT
Hello Cazen,

This looks to me like this is suffering from an unintended side effect of closing the FileSystem
object.  Hadoop internally caches instances of the FileSystem class, and the same instance
can be returned to multiple call sites.  Even after one call site closes it, it's possible
that other call sites still hold a reference to that same FileSystem instance.  Closing the
FileSystem instance makes it unusable.

HdfsAdmin#getInotifyEventStream is likely using the same FileSystem instance that your own
FileSystem.get call returns.  By closing it (using try-with-resources), that FileSystem instance
is made invalid for the subsequent calls to retrieve inotify events.

The FileSystem cache is a fairly common source of confusion.  However, its current behavior
is considered by design.  For reasons of backwards-incompatibility, we can't easily change
its behavior to help with confusing situations like this.  (Sorry!)

A few suggestions to try:

1. Just don't close the FileSystem.  Even if you don't close it explicitly, it will be closed
at process teardown via a shutdown hook.  This definitely looks wrong from a resource management
perspective, but a lot of applications work this way.

2. Call FileSystem#newInstance instead of FileSystem#get.  The newInstance method is guaranteed
to return an instance unique to that call site, not a shared instance potentially in use by
other call sites.  If you use newInstance, then you must guarantee it gets closed to avoid
a leak with a long-term impact.

3. You can disable the FileSystem cache for specific file system types by editing core-site.xml
and setting property fs.<file system type>.impl.disable.cache to true, e.g. fs.hdfs.impl.disable.cache.
 In general, disabling the cache is not desirable, because the performance benefits of the
cache are noticeable.  Sometimes this is a helpful workaround for specific applications though.

--Chris Nauroth

From: Cazen Lee <cazen.lee@gmail.com<mailto:cazen.lee@gmail.com>>
Date: Thursday, April 28, 2016 at 5:53 PM
To: "user@hadoop.apache.org<mailto:user@hadoop.apache.org>" <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: [HDFS-inotify] "IOException: The client is stopped" after reading file

Good day this is Cazen
Could I kindly ask about something weird situation when reading file in hdfs with inotify

- Env : MacOS, EMR, Linux(standalone) - same problem
- Version : Hadoop 2.7.2

1. I would like to write down a code that read file under particular location when it created(with
using inotify)
    So I modify sample code based on "hdfs-inotify-example" in github

2. I've changed code with read and print line to console when it renamed

case RENAME:
    Event.RenameEvent renameEvent = (Event.RenameEvent) event;
    Configuration conf = new Configuration();
    conf.set("fs.defaultFS", defaultFS);
    System.out.println(renameEvent.getDstPath() + " " + inputPath.getPath());
    if (renameEvent.getDstPath().startsWith(inputPath.getPath())) {
        //Try to read file
        try (FileSystem fs = FileSystem.get(conf)) {
            Path filePath = new Path(defaultFS + renameEvent.getDstPath());
            BufferedReader br = new BufferedReader(new InputStreamReader(fs.open(filePath)));
            String line;
            line = br.readLine();
            while (line != null) {
                line = br.readLine();

3. It works. But I encountered IOException in next eventStream.take() after file read. It
doesn't happen if I do not read file on hdfs.
DFSInotifyEventInputStream eventStream = admin.getInotifyEventStream();
EventBatch batch = eventStream.take();

Cazens-MacBook-Pro:hdfs-inotify-example Cazen$ java -jar target/hdfs-inotify-example-uber.jar
lastReadTxid = 0
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
TxId = 3134
event type = CREATE
  path = /cazen/test2.txt._COPYING_
  owner = Cazen
  ctime = 1461850245559
TxId = 3138
event type = CLOSE
TxId = 3139
event type = RENAME
/cazen/test2.txt /cazen/
--------------------File Start
Input File Text Sample LOL
--------------------File END
Exception in thread "main" java.io.IOException: The client is stopped
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1507)
at org.apache.hadoop.ipc.Client.call(Client.java:1451)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy9.getEditsFromTxid(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getEditsFromTxid(ClientNamenodeProtocolTranslatorPB.java:1511)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getEditsFromTxid(Unknown Source)
at org.apache.hadoop.hdfs.DFSInotifyEventInputStream.poll(DFSInotifyEventInputStream.java:111)
at org.apache.hadoop.hdfs.DFSInotifyEventInputStream.take(DFSInotifyEventInputStream.java:224)
at com.onefoursix.HdfsINotifyExample.main(HdfsINotifyExample.java:40)

There is possibility that I may have written the wrong code. If anyone already know about
this situation, could I ask the reason?
Any advice would be appreciated.
Thank you Have a good day :)


