Mailing-List: contact hbase-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-dev@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of al.lias@gmx.de designates
 213.165.64.20 as permitted sender)
Message-ID: <4BBB86DF.1010902@gmx.de>
Date: Tue, 06 Apr 2010 21:09:19 +0200
From: Al Lias <al.lias@gmx.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; de;
 rv:1.9.1.9) Gecko/20100317 Lightning/1.0b1 Thunderbird/3.0.4
MIME-Version: 1.0
To: hbase-dev@hadoop.apache.org
CC: todd@cloudera.com
Subject: Re: What means log "DIR* NameSystem.completeFile: failed to
 	complete..."
 ?
References: <4BBB2ACA.2010905@gmx.de>
 <q2i45f85f71004060843z59826af9yffc78c613b7c073e@mail.gmail.com>
In-Reply-To: <q2i45f85f71004060843z59826af9yffc78c613b7c073e@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

Hi,

	I set my Hbase' table families to a relativly small MAX_FILESIZE value
of 10Mb (to get many regions fast), which triggers a

"CompactSplitThread:IOException: Could not complete write to file..."

after some time - with a lost region (lost until restart of that RS). It
does not happen on any compaction/split though, I estimate in 1 of 20 cases.

I am loading small records at a rate of 100..600 per second to a 20 node
cluster (20x16Gb,4Core). LZO compression. Hbase 0.20.3.
dfs.datanode.socket.write.timeout=0 if that matters.

Has somebody an idea, why this underlaying hdfs error occurs (as
explained by Todd in the hadoop-common list)?

Thx,
  Al

Am 06.04.2010 17:43, schrieb Todd Lipcon:
> Hi Al,
> 
> Usually this indicates that the file was renamed or deleted while it was
> still being created by the client. Unfortunately it's not the most
> descriptive :)
> 
> -Todd
> 
> On Tue, Apr 6, 2010 at 5:36 AM, Al Lias <al.lias@gmx.de> wrote:
> 
>> Hi all,
>>
>>        this warning is written in FSFileSystem.java/completeFileInternal().
>> It
>> makes the calling code in NameNode.java throwing an IOException.
>>
>> FSFileSystem.java
>> ...
>> if (fileBlocks == null ) {
>>      NameNode.stateChangeLog.warn(
>>        "DIR* NameSystem.completeFile: "
>>        + "failed to complete " + src
>>        + " because dir.getFileBlocks() is null " +
>>          " and pendingFile is " +
>>          ((pendingFile == null) ? "null" :
>>        ("from " + pendingFile.getClientMachine()))
>>                                  );
>> ...
>>
>> What is the meaning of this warning? Any Idea what could have gone wrong
>> in such a case?
>>
>> (This popped up through hbase, but as this code is in HDFS, I am asking
>> this list)
>>...