hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
Date Fri, 14 Feb 2014 23:21:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902105#comment-13902105
] 

Andrew Wang commented on HDFS-3405:
-----------------------------------

Thanks for the rev Vinay, we're pretty close once we get some more manual testing done. Some
more comments:

General:
* I'd like to not leave the timeout at 10min, and now is actually a good time to test the
timeout config further in depth. Looking at HttpURLConnection, it has setConnectTimeout and
setReadTimeout, but I don't know how that works with PUT. Namely, I'm worried that the new
unit test you added is hitting the connect or read timeout, rather than slowness/pause during
the actual transfer. I think this isn't that hard to test, since you can combine a low timeout
with a low transfer rate (unit testable), or use Linux's {{tc}} (traffic control) if you want
to get fancy with manual testing. At a high-level, what we want here is for the NN/SbNN/2NN
to not hang indefinitely if the other side goes down completely, and if this is in fact a
socket timeout, then 60s is sufficient. If it's actually a timeout for the whole transfer,
then let's stick with 10m.
* The 2GB+ file test is also pretty important, so let's wait on that as well before committing.
* Should we refactor out "/imagetransfer" into a static constant in ImageServlet? We seem
to use it a bunch.

ImageServlet:
* Class javadoc should mention the Standby NameNode in addition to the Secondary NameNode,
since that's actually the more common deployment with modern Hadoop.

TransferFsImage:
* "Uploades" is still spelled wrong
* Reference to "/putimage" in a comment needs to be updated
* We should be throttling on the receiver side, not the sender. This way both GETs from and
PUTs to a NN will use the same throttle setting.

TestTransferFsImage:
* Let's not create a temp file in a global directory, I just had to fix some problems like
this in HDFS-3128. Instead, you can do {{FileSystem.getLocal}} and pass that to {{FileSystemTestHelper#getTestRootPath}}
to make a safe unique temp file.
* Comment explicitly mentions default 10min timeout, let's just say "otherwise it will wait
for the default" instead since we might change the default

> Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-3405
>                 URL: https://issues.apache.org/jira/browse/HDFS-3405
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha
>            Reporter: Aaron T. Myers
>            Assignee: Vinayakumar B
>         Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch,
HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch,
HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch
>
>
> As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986],
the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is
to issue an HTTP get request to tell the target NN to issue another GET request back to the
checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the
checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file,
rather than the double-GET scheme.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message