hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: file is already being created by NN_Recovery
Date Thu, 07 Apr 2011 16:41:54 GMT
If you have socket.dfs.timeout set to 0, consider removing it, most of
our issues like that went away after that.  This problem occurs when
you have datanode crash, and there is a conflict with the lease on the
file (which should expire in one hour, this is unconfigurable hard
timeout).   If you do end up in situation like that, the only way we
could resolve it is like this:

# stop the master
# hadoop fs -cp file new_file
# hadoop fs -rm file
# hadoop fs -cp new_file file
# start master, and watch it replay the log.

This appears to break the lease as new .log file does not have this issue.

-Jack

On Thu, Apr 7, 2011 at 9:35 AM, Daniel Iancu <daniel.iancu@1and1.ro> wrote:
> Hello everybody
> We've run into this, now popular, error on our cluster
>
> 2011-04-07 16:28:00,654 WARN IPC Server handler 0 on 8020
> org.apache.hadoop.hdfs.StateChange - DIR* NameSystem.startFile: failed to
> create file
> /hbase/.logs/search-hadoop-eu001.v300.gmx.net,60020,1302075782687/search-hadoop-eu001.v300.gmx.net%3A60020.1302075783467
> for DFSClient_hb_m_search-namenode-eu002.v300.gmx.net:60000_1302186078300 on
> client 10.1.100.32, because this file is already being created by
> NN_Recovery on 10.1.100.61
>
> I've read a couple of threads around it, still it seems that nobody
> pinpointed the cause of it? The only solution here remains to delete the log
> file and lose data ?
>
> I've seen  this error on almost any cluster we've installed so far, deleting
> logs was not concerning since all were test clusters. Now we got this on the
> production cluster, and strange, this cluster was just installed, there is
> no table and no data, no activity there. So what logs is master trying to
> create?
>
> We are running the latest CDH3B4 from Cloudera.
>
> Thanks for any hints,
> Daniel
>

Mime
View raw message