Return-Path: Delivered-To: apmail-hadoop-hbase-dev-archive@minotaur.apache.org Received: (qmail 3618 invoked from network); 6 Apr 2010 19:09:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Apr 2010 19:09:51 -0000 Received: (qmail 35963 invoked by uid 500); 6 Apr 2010 19:09:51 -0000 Delivered-To: apmail-hadoop-hbase-dev-archive@hadoop.apache.org Received: (qmail 35936 invoked by uid 500); 6 Apr 2010 19:09:51 -0000 Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-dev@hadoop.apache.org Delivered-To: mailing list hbase-dev@hadoop.apache.org Received: (qmail 35928 invoked by uid 99); 6 Apr 2010 19:09:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Apr 2010 19:09:51 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of al.lias@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 06 Apr 2010 19:09:43 +0000 Received: (qmail invoked by alias); 06 Apr 2010 19:09:21 -0000 Received: from port-92-194-2-119.dynamic.qsc.de (EHLO [192.168.178.32]) [92.194.2.119] by mail.gmx.net (mp060) with SMTP; 06 Apr 2010 21:09:21 +0200 X-Authenticated: #22100600 X-Provags-ID: V01U2FsdGVkX1+ELX1P9MCERdpBo2VrcjGrTCA2HGVAczIP2cderv XsUS9GF+OpydWi Message-ID: <4BBB86DF.1010902@gmx.de> Date: Tue, 06 Apr 2010 21:09:19 +0200 From: Al Lias User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de; rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4 MIME-Version: 1.0 To: hbase-dev@hadoop.apache.org CC: todd@cloudera.com Subject: Re: What means log "DIR* NameSystem.completeFile: failed to complete..." ? References: <4BBB2ACA.2010905@gmx.de> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Y-GMX-Trusted: 0 X-FuHaFi: 0.48999999999999999 X-Virus-Checked: Checked by ClamAV on apache.org Hi, I set my Hbase' table families to a relativly small MAX_FILESIZE value of 10Mb (to get many regions fast), which triggers a "CompactSplitThread:IOException: Could not complete write to file..." after some time - with a lost region (lost until restart of that RS). It does not happen on any compaction/split though, I estimate in 1 of 20 cases. I am loading small records at a rate of 100..600 per second to a 20 node cluster (20x16Gb,4Core). LZO compression. Hbase 0.20.3. dfs.datanode.socket.write.timeout=0 if that matters. Has somebody an idea, why this underlaying hdfs error occurs (as explained by Todd in the hadoop-common list)? Thx, Al Am 06.04.2010 17:43, schrieb Todd Lipcon: > Hi Al, > > Usually this indicates that the file was renamed or deleted while it was > still being created by the client. Unfortunately it's not the most > descriptive :) > > -Todd > > On Tue, Apr 6, 2010 at 5:36 AM, Al Lias wrote: > >> Hi all, >> >> this warning is written in FSFileSystem.java/completeFileInternal(). >> It >> makes the calling code in NameNode.java throwing an IOException. >> >> FSFileSystem.java >> ... >> if (fileBlocks == null ) { >> NameNode.stateChangeLog.warn( >> "DIR* NameSystem.completeFile: " >> + "failed to complete " + src >> + " because dir.getFileBlocks() is null " + >> " and pendingFile is " + >> ((pendingFile == null) ? "null" : >> ("from " + pendingFile.getClientMachine())) >> ); >> ... >> >> What is the meaning of this warning? Any Idea what could have gone wrong >> in such a case? >> >> (This popped up through hbase, but as this code is in HDFS, I am asking >> this list) >>...