hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: problem restarting 0.18
Date Sun, 28 Sep 2008 15:39:28 GMT
Your hdfs looks ill.  Its complaining a data file in -ROOT- catalog 
table is 'missing'.  What happens if you run '$HADOOP_HOME/bin/hadoop 
fsck HBASE_HOMDIR'?  More context around the errors would help with 
analysis.   You've tried restarting your HDFS?

Thanks,
St.Ack


yoav.morag wrote:
>  unfortunately, this is not the case :-( . I have installed NTP on the
> cluster , but the problem remains in exactly the same way. it is now clear
> from the logs, however, that the problem occurs in the master first : 
>
> 2008-09-28 11:04:48,412 ERROR org.apache.hadoop.dfs.LeaseManager:
> /hbase/-ROOT-/70236052/info/mapfiles/2686380382008424762/data not found in
> lease.paths (=[/hbase/-ROOT-/70236052/info/mapfiles
>
> and only then on the regionservers : 
>
> 2008-09-28 11:05:02,848 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: Unhandled exception.
> Aborting...
>
> any more ideas will be greatly appreciated ... 
>
>
>
> Jean-Daniel Cryans-2 wrote:
>   
>> You maybe just found your problem, the clocks are not synchronized. It is
>> a
>> requirement when using HBase to have synchronized clocks, see
>> http://hadoop.apache.org/hbase/docs/r0.18.0/api/index.html
>>
>> Thx for looking at it,
>>
>> J-D
>>
>> On Sun, Sep 28, 2008 at 3:47 AM, yoav.morag <yoav@corrigon.com> wrote:
>>
>>     
>>> debug  didn't seem to give much, as far as I could tell . i did however
>>> notice the following errors on hadoop log on the name node :
>>> I am attaching (
>>>
>>> http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log
>>> hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.loghadoop-pm_app-namenode-cl-t072-330cl.privatedns.com.log>
>>>
>>> http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log
>>> hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log<http://www.nabble.com/file/p19709529/hbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.loghbase-pm_app-regionserver-cl-t072-290cl.privatedns.com.log>
>>> ) the full logs
>>> from the name node and one region servers (there are 4 , all with
>>> identical
>>> errors). note the clocks are not synchronized across the cluster, so the
>>> times in the logs can not be used to compare order between machines.
>>>
>>> suspicous errors :
>>> 2008-09-28 03:15:47,316 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/data not found
>>> in
>>> lease.paths
>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index,
>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,317 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/7031159331294621371/index not found
>>> in
>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/info/7031159331294621371 not found in
>>> lease.paths (=[/hbase/-ROOT-/70236052/log/hlog.dat.1222585931186,
>>> /hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,318 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/log/hlog.dat.1222585931186 not found in
>>> lease.paths
>>> (=[/hbase/.META./1028785192/log/hlog.dat.1222585931303])
>>> 2008-09-28 03:15:47,324 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/data not found
>>> in
>>> lease.paths
>>> (=[/hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index,
>>> /hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2008-09-28 03:15:47,325 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/mapfiles/8544907469765511915/index not found
>>> in
>>> lease.paths
>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2008-09-28 03:15:47,326 ERROR org.apache.hadoop.dfs.LeaseManager:
>>> /hbase/-ROOT-/70236052/info/info/8544907469765511915 not found in
>>> lease.paths
>>> (=[/hbase/log_10.249.0.10_1222585657683_60020/hlog.dat.1222585658340])
>>> 2
>>>
>>>
>>>
>>>
>>> Jean-Daniel Cryans-2 wrote:
>>>       
>>>> There is no other exceptions before that? Did you enable DEBUG? Can we
>>>>         
>>> see
>>>       
>>>> a
>>>> whole start/stop log of your region server?
>>>>
>>>> Thx,
>>>>
>>>> J-D
>>>>
>>>> On Thu, Sep 25, 2008 at 11:09 AM, yoav.morag <yoav@corrigon.com> wrote:
>>>>
>>>>         
>>>>> I am experiencing problems when restarting a cluster with hadoop/hbase
>>>>> 0.18.0. hadoop restarts OK, however hbase regionservers all exit with
>>>>>           
>>> the
>>>       
>>>>> message :
>>>>> Exception in thread "regionserver/0:0:0:0:0:0:0:0:60020"
>>>>> java.lang.NullPointerException
>>>>>        at
>>>>>
>>>>>
>>>>>           
>>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:448)
>>>       
>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>> strange enough, the said line appears to indicate log is null, however
>>>>>           
>>> a
>>>       
>>>>> log
>>>>> is created and messages are written into it...
>>>>> the restart scenario is very simple, and it happens even with a clean
>>>>> database , on a newly formatted FS. I have also checked no ghost
>>>>> processes
>>>>> exist before start.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>           
>>> "$INSTALLDIR/$HBASE/bin/stop-hbase.sh;$INSTALLDIR/$HADOOP/bin/stop-dfs.sh;"
>>>       
>>>>>
>>>>>           
>>> "$INSTALLDIR/$HADOOP/bin/start-dfs.sh;$INSTALLDIR/$HBASE/bin/start-hbase.sh;"
>>>       
>>>>> any ideas ?
>>>>> --
>>>>> View this message in context:
>>>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19671584.html
>>>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>>>
>>>>>
>>>>>           
>>>>         
>>> --
>>> View this message in context:
>>> http://www.nabble.com/problem-restarting-0.18-tp19671584p19709529.html
>>> Sent from the HBase User mailing list archive at Nabble.com.
>>>
>>>
>>>       
>>     
>
>   


Mime
View raw message