Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Thu, 22 Dec 2016 18:55:58 +0000 (UTC)
From: "James Clampffer (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13013321.1476825824000.583851.1482432958920@Atlassian.JIRA>
In-Reply-To: <JIRA.13013321.1476825824000@Atlassian.JIRA>
References: <JIRA.13013321.1476825824000@Atlassian.JIRA> <JIRA.13013321.1476825824114@arcas>
Subject: [jira] [Updated] (HDFS-11028) libhdfs++:
 FileHandleImpl::CancelOperations needs to be able to cancel pending
 connections
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Thu, 22 Dec 2016 18:56:07 -0000


     [ https://issues.apache.org/jira/browse/HDFS-11028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Clampffer updated HDFS-11028:
-----------------------------------
    Description: 
Cancel support is now reasonably robust except the case where a FileHandle operation ends up causing the RpcEngine to try to create a new RpcConnection.  In HA configs it's common to have something like 10-20 failovers and a 20 second failover delay (no exponential backoff just yet). This means that all of the functions with synchronous interfaces can still block for many minutes after an operation has been canceled, and often the cause of this is something trivial like a bad config file.

The current design makes this sort of thing tricky to do because the FileHandles need to be individually cancelable via CancelOperations, but they share the RpcEngine that does the async magic.

Updated design:
Original design would end up forcing lots of reconnects.  Not a huge issue on an unauthenticated cluster but on a kerberized cluster this is a recipe for Kerberos thinking we're attempting a replay attack.

User visible cancellation and internal resources cleanup are separable issues.  The former can be implemented by atomically swapping the callback of the operation to be canceled with a no-op callback.  The original callback is then posted to the IoService with an OperationCanceled status and the user is no longer blocked.  For RPC cancels this is sufficient, it's not expensive to keep a request around a little bit longer and when it's eventually invoked or timed out it invokes the no-op callback and is ignored (other than a trace level log notification).  Connect cancels push a flag down into the RPC engine to kill the connection and make sure it doesn't attempt to reconnect.


  was:
Cancel support is now reasonably robust except the case where a FileHandle operation ends up causing the RpcEngine to try to create a new RpcConnection.  In HA configs it's common to have something like 10-20 failovers and a 20 second failover delay (no exponential backoff just yet). This means that all of the functions with synchronous interfaces can still block for many minutes after an operation has been canceled, and often the cause of this is something trivial like a bad config file.

The current design makes this sort of thing tricky to do because the FileHandles need to be individually cancelable via CancelOperations, but they share the RpcEngine that does the async magic.

A non-exhaustive list of design assumptions:
1) multiple users will be doing stuff on the same FS in the same process, and some users might be a lot more impatient than others.  This means that it's possible that progress is slow and they want to give up but it wasn't stalled and other users are still able to make progress. Side effects of a FileHandle::CancelOperations call should only be visible to the owner of that FH.
2) In most use cases the library is spending more time in the read path than namenode metadata operations.  At any given time it's unlikely that there are a crazy amount of pending RPC requests though this certainly can happen (see [~anatoli.shein]'s awesome tools).

Some sparse design plans to help out reviewers:
1a) RPC Request objects get something analogous to the ReaderGroup to track all pending requests associated with a FileHandle.  As long as there is a transitive dependency on the FH from the request a flag can be pushed down.
1b) FileSystem operations also need the same support.  Since they return their result directly there isn't an object to call a cancel method on.  One approach here would be to pass in an optional flag (CancelHandle object).

2) Based on assumption 2 it's generally not unacceptably expensive to cancel and resend async RPC calls.  Since the RpcConnection is shared for all pending requests it needs to be wiped out.  This will cause all pending and on-the-fly requests to return asio::operation_aborted status.  If the Request object doesn't have it's flag set to canceled it gets placed back in line using the same mechanism as common RPC errors.  This retry does not count against the retry_count or failover_count since it's a side effect of the cancel.  Nor should this cause the RpcEngine to attempt to fail over.


> libhdfs++: FileHandleImpl::CancelOperations needs to be able to cancel pending connections
> ------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11028
>                 URL: https://issues.apache.org/jira/browse/HDFS-11028
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: James Clampffer
>            Assignee: James Clampffer
>
> Cancel support is now reasonably robust except the case where a FileHandle operation ends up causing the RpcEngine to try to create a new RpcConnection.  In HA configs it's common to have something like 10-20 failovers and a 20 second failover delay (no exponential backoff just yet). This means that all of the functions with synchronous interfaces can still block for many minutes after an operation has been canceled, and often the cause of this is something trivial like a bad config file.
> The current design makes this sort of thing tricky to do because the FileHandles need to be individually cancelable via CancelOperations, but they share the RpcEngine that does the async magic.
> Updated design:
> Original design would end up forcing lots of reconnects.  Not a huge issue on an unauthenticated cluster but on a kerberized cluster this is a recipe for Kerberos thinking we're attempting a replay attack.
> User visible cancellation and internal resources cleanup are separable issues.  The former can be implemented by atomically swapping the callback of the operation to be canceled with a no-op callback.  The original callback is then posted to the IoService with an OperationCanceled status and the user is no longer blocked.  For RPC cancels this is sufficient, it's not expensive to keep a request around a little bit longer and when it's eventually invoked or timed out it invokes the no-op callback and is ignored (other than a trace level log notification).  Connect cancels push a flag down into the RPC engine to kill the connection and make sure it doesn't attempt to reconnect.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org