hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manjunath Anand (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13726) Enforce that FileSystem initializes only a single instance of the requested FileSystem.
Date Sat, 18 Mar 2017 19:01:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931329#comment-15931329

Manjunath Anand commented on HADOOP-13726:

Thanks [~stevel@apache.org] for praise in comments , I felt happy. Yes the array in catch
block is to get around inside the closure and capture the exception.

Thanks [~cnauroth] for your inputs. Yes I agree that the computeIfAbsent is subjected to locking
of non equal keys having same hashbucket and usually the chances of it happening are lesser
when a large initial size is given to the ConcurrentHashMap and when the hash distribution
is better but it can still be a problem.

After researching a little bit more on the challenge we have to pass the uri and conf to the
load method, I stumbled upon [https://github.com/google/guava/wiki/CachesExplained#from-a-callable]
and found a way to pass the URI and Conf to the Callable method without adding them to the
Key class and am presenting the code below:-
private com.google.common.cache.Cache<Key, FileSystem> map = CacheBuilder.newBuilder().build();
    private FileSystem getInternal(final URI uri, final Configuration conf, Key key)
            throws IOException {
       * Calling getIfPresent to avoid unnecessary creation of Callable
       * object everytime getInternal is called.
      FileSystem fs = map.getIfPresent(key);
      if(fs != null) {
        return fs;
      if (map.size() == 0
              && !ShutdownHookManager.get().isShutdownInProgress()) {
        ShutdownHookManager.get().addShutdownHook(clientFinalizer, SHUTDOWN_HOOK_PRIORITY);
      Callable<FileSystem> cb = new Callable<FileSystem>() {
          public FileSystem call() throws Exception {
            FileSystem fs = createFileSystem(uri, conf);
            if(fs!=null) throw new IOException();
            fs.key = key;
            if (conf.getBoolean(
            return fs;

      try {
        fs = map.get(key,cb);
      } catch (ExecutionException e) {
        LOG.error("Exception while creating file system for key "+key,e);
        if(e.getCause() instanceof IOException) throw (IOException)e.getCause();

      return fs;

Note that using guava Cache instead of java Map would mean additional code changes to existing
method references such as entrySet, isEmpty, remove, keySet of java Map in the FileSystem
and replacing it with corresponding methods from guava Cache.

Please let me know your thoughts about this approach and code. 

> Enforce that FileSystem initializes only a single instance of the requested FileSystem.
> ---------------------------------------------------------------------------------------
>                 Key: HADOOP-13726
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13726
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Nauroth
>            Assignee: Manjunath Anand
> The {{FileSystem}} cache is intended to guarantee reuse of instances by multiple call
sites or multiple threads.  The current implementation does provide this guarantee, but there
is a brief race condition window during which multiple threads could perform redundant initialization.
 If the file system implementation has expensive initialization logic, then this is wasteful.
 This issue proposes to eliminate that race condition and guarantee initialization of only
a single instance.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message