Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 23213 invoked from network); 10 Oct 2007 21:09:51 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Oct 2007 21:09:51 -0000 Received: (qmail 90000 invoked by uid 500); 10 Oct 2007 21:09:30 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 89970 invoked by uid 500); 10 Oct 2007 21:09:30 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 89960 invoked by uid 99); 10 Oct 2007 21:09:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2007 14:09:30 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Oct 2007 21:09:41 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id EB61D7141FD for ; Wed, 10 Oct 2007 14:08:50 -0700 (PDT) Message-ID: <14604064.1192050530958.JavaMail.jira@brutus> Date: Wed, 10 Oct 2007 14:08:50 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1707) DFS client can allow user to write data to the next block while uploading previous block to HDFS In-Reply-To: <11154288.1186792782759.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533875 ] dhruba borthakur commented on HADOOP-1707: ------------------------------------------ Thanks Doug for your comments. 1. My thinking is as follows: the client has a bunch of small buffers. Say 2 buffers each of size 16K. When the first buffer is full, it writes that buffer to the first datanode in the pipeline. The client meanwhile can continue to fill up the remaining buffer(s). The first datanode, on receipt of this buffer, sends it to the next datanode in the pipeline and also writes it to its local disk. 2. If a datanode fails to write a buffer to its disk, it is reported back to the client. The client removes this datanode from the pipeline and continues to write to the remaining two datanodes. The file in the bad datanode remains in the "tmp" directory. 3. When the file is closed, the under-replicated blocks will be replicated by the namenode. > DFS client can allow user to write data to the next block while uploading previous block to HDFS > ------------------------------------------------------------------------------------------------ > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > > The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. > A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.