Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 69446 invoked from network); 30 Oct 2007 17:41:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 30 Oct 2007 17:41:25 -0000 Received: (qmail 41211 invoked by uid 500); 30 Oct 2007 17:39:59 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 41183 invoked by uid 500); 30 Oct 2007 17:39:59 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 41174 invoked by uid 99); 30 Oct 2007 17:39:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2007 10:39:59 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2007 17:40:11 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1A646714233 for ; Tue, 30 Oct 2007 10:39:51 -0700 (PDT) Message-ID: <5726421.1193765991105.JavaMail.jira@brutus> Date: Tue, 30 Oct 2007 10:39:51 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1707) Remove the DFS Client disk-based cache In-Reply-To: <11154288.1186792782759.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538867 ] Doug Cutting commented on HADOOP-1707: -------------------------------------- This still appears to have the cascading timeout issue, no? Each stage in the pipeline must have a smaller timeout than the prior stage or else the whole pipeline will fail when any node fails. In particular, the client must use a much larger timeout, since it must permit the primary to potentially replay the entire block downstream. Perhaps there can be multiple kinds of acks, some which just indicate that the primary is still alive and others that indicate that replication is complete? (Acks might include the current level of replication.) That might help distinguish the cases where the primary has actually gone down from those where it is still doing productive work. Then one timeout could be used for communications, and a substantially longer one for awaiting replication. I also wonder whether, instead of having so many threads, we might implement this with async i/o. Much of the processing seems simple enough that maintaining a state object for each file being written and using a single thread that selects on sockets and then updates the state might be more efficient. Perhaps it will be simpler to write these with threads, then convert them to async? We discussed offline last week a different approach from what you've described here. In that, acks would only signal that the immediately downstream node had written the data, not all downstream nodes. Only at block end or flush would it check that sufficient replicas exist, with a different command. Why have you abandoned this plan? An intermediate approach might be to use buffer pools on each datanode in the pipeline. Each would write the buffer locally and queue it to be written downstream. The buffer would only be returned to the pool when both writes complete. A datanode could block when no buffers are available. That might improve throughput. > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > > The DFS client currently uses a staging file on local disk to cache all user-writes to a file. When the staging file accumulates 1 block worth of data, its contents are flushed to a HDFS datanode. These operations occur sequentially. > A simple optimization of allowing the user to write to another staging file while simultaneously uploading the contents of the first staging file to HDFS will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.