hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: file is already being created by NN_Recovery
Date Fri, 08 Apr 2011 17:04:11 GMT
On Fri, Apr 8, 2011 at 9:11 AM, Daniel Iancu <daniel.iancu@1and1.ro> wrote:

>  What we've did was to test NN recovery from SNN on a new installed
> cluster. After copying the image from SNN the cluster was started again and
> it seemed ok. After aprox. 1h it started this infinite loop and is was doing
> this for the entire night (we've checked the logs).
> Since we were not using it, we've acknowledged that the next day. There was
> no data and no activity there so any kind on normal recover should have
> finished in whole this time. Unfortunately I don't have more precise details
> than this.
>
> We'd be happy to upgrade to the final CDH3 if it will be available next
> week.  If not, we'll keep an eye on this issue and if it became a pain we'll
> ask for the patch.
>

Yep, it will be available 4/12.

Thanks
-Todd


>
> On 04/07/2011 08:11 PM, Stack wrote:
>
>> The RegionServer is down for sure?  Else it sounds like an issue that
>> was addressed by the addition of a new short-circuit API call added to
>> HDFS on the hadoop-0.20-append branch.  The patches that added this
>> new call went into the branch quite a while ago.   They are:
>>
>>  HDFS-1554. New semantics for recoverLease. (hairong)
>>
>>  HDFS-1555. Disallow pipelien recovery if a file is already being
>>     lease recovered. (hairong)
>>
>> These patches are not in CDH3b*.  They are in the CDH3 release which
>> is due any day now.
>>
>> HBase 0.90.2 makes use of the new API: See
>> https://issues.apache.org/jira/browse/HBASE-3285.  Attached to that
>> issue is a patch for CDH3b2, a patch we are running here at SU.  Shout
>> if you need a version of this patch for CDH3b3/4.
>>
>> St.Ack
>>
>>
>> On Thu, Apr 7, 2011 at 9:35 AM, Daniel Iancu<daniel.iancu@1and1.ro>
>>  wrote:
>>
>>> Hello everybody
>>> We've run into this, now popular, error on our cluster
>>>
>>> 2011-04-07 16:28:00,654 WARN IPC Server handler 0 on 8020
>>> org.apache.hadoop.hdfs.StateChange - DIR* NameSystem.startFile: failed to
>>> create file
>>> /hbase/.logs/search-hadoop-eu001.v300.gmx.net,60020,1302075782687/
>>> search-hadoop-eu001.v300.gmx.net%3A60020.1302075783467
>>> for DFSClient_hb_m_search-namenode-eu002.v300.gmx.net:60000_1302186078300
>>> on
>>> client 10.1.100.32, because this file is already being created by
>>> NN_Recovery on 10.1.100.61
>>>
>>> I've read a couple of threads around it, still it seems that nobody
>>> pinpointed the cause of it? The only solution here remains to delete the
>>> log
>>> file and lose data ?
>>>
>>> I've seen  this error on almost any cluster we've installed so far,
>>> deleting
>>> logs was not concerning since all were test clusters. Now we got this on
>>> the
>>> production cluster, and strange, this cluster was just installed, there
>>> is
>>> no table and no data, no activity there. So what logs is master trying to
>>> create?
>>>
>>> We are running the latest CDH3B4 from Cloudera.
>>>
>>> Thanks for any hints,
>>> Daniel
>>>
>>>
> --
> Daniel Iancu
> Java Developer,Web Components Romania
> 1&1 Internet Development srl.
> 18 Mircea Eliade St
> Sect 1, Bucharest
> RO Bucharest, 012015
> www.1and1.ro
> Phone:+40-031-223-9081
> Email:daniel.iancu@1and1.ro
> IM:diancu@united.domain
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message