Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 34932 invoked from network); 15 Feb 2007 00:30:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Feb 2007 00:30:27 -0000 Received: (qmail 5128 invoked by uid 500); 15 Feb 2007 00:30:34 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 5082 invoked by uid 500); 15 Feb 2007 00:30:34 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 5073 invoked by uid 99); 15 Feb 2007 00:30:34 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Feb 2007 16:30:34 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Feb 2007 16:30:25 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BEB007141E1 for ; Wed, 14 Feb 2007 16:30:05 -0800 (PST) Message-ID: <1623609.1171499405778.JavaMail.jira@brutus> Date: Wed, 14 Feb 2007 16:30:05 -0800 (PST) From: "Raghu Angadi (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-990) Datanode doesn't retry when write to one (full)drive fail In-Reply-To: <10216346.1170886445499.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-990: -------------------------------- Attachment: HADOOP-990-1.patch patch attached. Two fixes : DFSClient. : updates block size before sending to client. DataNode : FSDataset : first looks for a volume with freespace larger than 10 times the block size. (this is strictly not required but will handle almost full volumes better). > Datanode doesn't retry when write to one (full)drive fail > --------------------------------------------------------- > > Key: HADOOP-990 > URL: https://issues.apache.org/jira/browse/HADOOP-990 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: Koji Noguchi > Assigned To: Raghu Angadi > Attachments: HADOOP-990-1.patch > > > When one drive is 99.9% full and datanode choose that drive to write, it fails with > 2007-02-07 18:16:56,574 WARN org.apache.hadoop.dfs.DataNode: DataXCeiver > org.apache.hadoop.util.DiskChecker$DiskOutOfSpaceException: No space left on device > at org.apache.hadoop.dfs.DataNode$DataXceiver.writeBlock(DataNode.java:801) > at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:563) > at java.lang.Thread.run(Thread.java:595) > Combined with HADOOP-940, these failed blocks stay under-replicated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.