hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6762) exception while doing RPC I/O closes channel
Date Fri, 04 Jun 2010 01:53:55 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12875421#action_12875421
] 

Todd Lipcon commented on HADOOP-6762:
-------------------------------------

Looking at the code:
- Instead of using the CountdownLatch, can you change the Runnable to a Callable<Void>()
and then get back a Future<Void>? Seems a little cleaner.
- The behavior is different since we've added a timeout waiting to sendParam. Do you think
this change is necessary? (under what case would we block forever waiting to write?)
- Regarding the issue I raised above with a Writable param that throws NPE, can we move the
actual buffer construction back into the calling thread? Then if it throws an RTE, the user
will see it (rather than it getting lost somewhere). There's still an issue that this will
leave the call on the connection queue, but that's probably worth a separate jira.

Regarding the resource usage issue: can we just use a static  cachedthreadpool that's shared
across all of RPC for sending params? In the common case it would only have 0 or 1 threads
but it could grow as necessary and then shrink back when idle.

> exception while doing RPC I/O closes channel
> --------------------------------------------
>
>                 Key: HADOOP-6762
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6762
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>            Reporter: sam rash
>            Assignee: sam rash
>         Attachments: hadoop-6762-1.txt, hadoop-6762-2.txt, hadoop-6762-3.txt, hadoop-6762-4.txt,
hadoop-6762-6.txt
>
>
> If a single process creates two unique fileSystems to the same NN using FileSystem.newInstance(),
and one of them issues a close(), the leasechecker thread is interrupted.  This interrupt
races with the rpc namenode.renew() and can cause a ClosedByInterruptException.  This closes
the underlying channel and the other filesystem, sharing the connection will get errors.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message