hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-12103) libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437 is resolved
Date Mon, 10 Jul 2017 18:29:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Clampffer updated HDFS-12103:
-----------------------------------
    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Thanks for reviewing [~xiaowei.zhu]!  Committed to HDFS-8707.

Manual testing was done by verifying the steps in the workaround procedure can cancel a slow
connection.  Fix also gets run with and without valgrind as part of another project on a regular
basis - in that case it's too closely coupled to the project to isolate the test.  My hope
is to fix the root issue and revert this in the next 3-4 weeks once I finish up HDFS-11807
and HDFS-12111.

> libhdfs++: Provide workaround to support cancel on filesystem connect until HDFS-11437
is resolved
> --------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-12103
>                 URL: https://issues.apache.org/jira/browse/HDFS-12103
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>         Attachments: HDFS-12103.HDFS-8707.000.patch
>
>
> HDFS-11437 is going to take a non-trivial amount of work to do right.  In the meantime
it'd be nice to have a way to cancel pending connections (even when the FS claimed they are
finished).  
> Proposed workaround is to relax the rules about when FileSystem::CancelPending connect
can be called since it isn't able to properly determine when it's connected anyway.  In order
to determine when the FS has connected you can do some simple RPC call since that will wait
on failover.  If CancelPending can be called during that first RPC call then it will effectively
be canceling FileSystem::Connect
> Current cancel rules - asterisk on steps where CancelPending is allowed
> FileSystem::Connect called
> FileSystem communicates with first NN *
> FileSystem::Connect returns - even if it hasn't communicated with the active NN
> Proposed relaxation
> FileSystem::Connect called
> FileSystem communicates with first NN*
> FileSystem::Connect returns *
> FileSystem::GetFileInfo called * -any namenode RPC call will do, ignore perm errors
> RPC engine blocks until it hits the active or runs out of retries *
> FileSystem::GetFileInfo returns
> It'd be up to the user to add in the dummy NN RPC call.  Once HDFS-11437 is fixed this
workaround can be removed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message