Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7545410782 for ; Tue, 4 Mar 2014 08:52:31 +0000 (UTC) Received: (qmail 56626 invoked by uid 500); 4 Mar 2014 08:52:30 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 56030 invoked by uid 500); 4 Mar 2014 08:52:23 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 55981 invoked by uid 99); 4 Mar 2014 08:52:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Mar 2014 08:52:21 +0000 Date: Tue, 4 Mar 2014 08:52:21 +0000 (UTC) From: "Vinayakumar B (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-3405) Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-3405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinayakumar B updated HDFS-3405: -------------------------------- Attachment: HDFS-3405.patch Hi All, Sorry for the late response here. Changes in the latest patch. 1. Verified uploading of big image files and Confirmed that *timeout (dfs.image.transfer.timeout)* set while uploading the file is just a SocketTimeout not the entire transfer timeout. This confirmed by trying to upload 1GB sized image file with only 5 sec timeout. It was successfull even though total upload in very slow n/w took ~5 min. So we can reduce default value of *dfs.image.transfer.timeout* to 60 second which is default socket timeout in hadoop. 2. I was facing OOME while uploading 2GB sized files due to internal buffering of HTTP streaming. Increased the Heapsizes upto 15GB, still it was taking lot of time So, used {{connection.setChunkedStreamingMode(64 * 1024);}} with one extra parameter {{File-Length}} instead of {{Content-Length}} to indicate the file length for verification. Since {{Content-Length}} is a integer, for more than 2GB sized files this will not be correct. After that verified upload of 2GB+ sized files successfully. 3. Updated all previous comments. > Checkpointing should use HTTP POST or PUT instead of GET-GET to send merged fsimages > ------------------------------------------------------------------------------------ > > Key: HDFS-3405 > URL: https://issues.apache.org/jira/browse/HDFS-3405 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 1.0.0, 3.0.0, 2.0.5-alpha > Reporter: Aaron T. Myers > Assignee: Vinayakumar B > Attachments: HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch, HDFS-3405.patch > > > As Todd points out in [this comment|https://issues.apache.org/jira/browse/HDFS-3404?focusedCommentId=13272986&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13272986], the current scheme for a checkpointing daemon to upload a merged fsimage file to an NN is to issue an HTTP get request to tell the target NN to issue another GET request back to the checkpointing daemon to retrieve the merged fsimage file. There's no fundamental reason the checkpointing daemon can't just use an HTTP POST or PUT to send back the merged fsimage file, rather than the double-GET scheme. -- This message was sent by Atlassian JIRA (v6.2#6252)