hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11546) Federation Router RPC server
Date Tue, 02 May 2017 23:05:04 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993970#comment-15993970

Chris Douglas commented on HDFS-11546:

Thanks for the iterations, I think this is close.

bq. How do we link to HADOOP-12077
bq. What kind of documentation would we add for the TODOs
I don't think it needs an explicit link, I was just trying to understand the semantics the
router implements. Users of router-based HDFS federation (R-BF) will want to understand what
guarantees it provides for these calls without reading the code. Somewhere in the branch there
should be a .md documenting how to configure and use R-BF, how the config knobs interact,
limitations (e.g., no snapshots), etc. Filing a documentation subtask on HDFS-10467 will provide
a place to track its impl.

Comments on 006:
* The {{ConnectionPool#isConnectionUsable}} method doesn't distinguish between a reference
to a busy, closed, or even null connection. The caller is expected to handle all these cases,
or are closed/busy the same case? Should the caller retry on null?
* It looks like the connection could be returned to multiple threads before it's marked busy,
since that only happens at {{ConnectionContext#getClient}}.
** The caller could receive a "non-busy" connection handle, only to have it closed before
it can use it.
** Since {{busy}} is a boolean, those multiple threads will mark the thread as non-busy at
the first call that releases it
** It's plausible that this will grow a pool of connections roughly proportional to load,
but its semantics are fuzzy.
* The synchronization in {{ConnectionPool}} is hard to follow. For example, {{connections}}
is protected by the instance ({{#close}}), by the read lock ({{#getConnection}}), by a lock
on the atomic int ({{#addingConnections}}). Even if this improves concurrency, it seems difficult
to maintain without introducing deadlocks (I couldn't find any cycles in the existing code).
** Would it make sense for the {{ConnectionManager}} to mange the size of its {{ConnectionPool}}
instances, rather than having each dynamically size itself? Imposing a cap on the total number
of concurrent connections, QoS/quotas, etc. at the router probably requires some central management,
and it could make the synchronization of the pool itself trivial (read lock for grabbing a
connection from the pool, a write lock to add a thread from the manager
** The pattern of creating a thread to create the connection out-of-band is kind of heavy.
If this were moved to the {{ConnectionManager}}, perhaps the decision could be inline and-
if it's expensive- the connection creation could be offloaded to a single thread (pool) in
the manager

The rest of the patch looks OK. I can't really go through {{ClientProtocol}} in detail, but
this looks fine as a base.

I looked at {{getListing}} and {{create}}. The semantics for the multiple possible create
locations, as discussed in HADOOP-12077 and elsewhere, are hard to reason about. If a remote
location is temporarily unavailable, when it comes back clients may have a very strange view
of the system, particularly retrying through separate routers which may have a partitioned
view. It's an interesting capability to include, and for read-only paths replicated at all
targets it's probably fine. If users have docs laying out how to use it (safely and dangerously),
it seems fine to include even if it won't work seamlessly.

> Federation Router RPC server
> ----------------------------
>                 Key: HDFS-11546
>                 URL: https://issues.apache.org/jira/browse/HDFS-11546
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs
>    Affects Versions: HDFS-10467
>            Reporter: Inigo Goiri
>            Assignee: Inigo Goiri
>         Attachments: HDFS-11546-HDFS-10467-000.patch, HDFS-11546-HDFS-10467-001.patch,
HDFS-11546-HDFS-10467-002.patch, HDFS-11546-HDFS-10467-003.patch, HDFS-11546-HDFS-10467-004.patch,
HDFS-11546-HDFS-10467-005.patch, HDFS-11546-HDFS-10467-006.patch
> RPC server side of the Federation Router implements ClientProtocol.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message