hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Barry Haddow <bhad...@inf.ed.ac.uk>
Subject Re: Region servers shut down with UnknownScannerException
Date Mon, 29 Sep 2008 17:34:12 GMT
Thanks for the suggestions - responses inline.

On Monday 29 September 2008 18:15:53 you wrote:
> Barry:
>
>  From the below, looks like an issue in HDFS.   If regionserver is
> having issues talking to HDFS, it shuts itself down.
>
> Tell us more.  Are there other, heavy-duty processes running on the same
> servers hosting datanodes and regionservers?

Yes, there are heavy duty processes running on the same servers. This is 
unavoidable as we need the cluster for other tasks. 

>
> Enable DEBUG on your cluster and makes sure you've set your ulimit file
> descriptors up from default.  See the FAQ in wiki for how to do both.

Which faq are you referring to? I've set both hadoop and hbase to debug, and 
restarted. The fd limit is 8192. What should I be looking for and in which 
logs? 

Can I tune hbase so it is more tolerant of hdfs issues?

regards
Barry

>
> Thanks,
> St.Ack
>
> Barry Haddow wrote:
> > Hi
> >
> > I recently set up a small hbase cluster (v 0.18) running on top of hadoop
> > v.0.18.1. However I'm observing that the region servers spontaneously
> > shut themselves down, usually with an UnknownScannerException. For
> > instance, this weekend, I discovered that all four had shut down, with
> > messages like the following in the logs:
> >
> > 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Exception
> > in createBlockOutputStream java.io.IOException: Bad connect ack with
> > firstBadLink 129.215.197.39:50010
> > 2008-09-29 05:50:17,203 INFO org.apache.hadoop.dfs.DFSClient: Abandoning
> > block blk_-5829206400135277905_3045
> > 2008-09-29 07:29:16,552 INFO
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > MSG_CALL_SERVER_STARTUP 2008-09-29 07:46:35,796 INFO
> > org.apache.hadoop.ipc.Server: IPC Server handler 4 on 60020, call
> > next(-1347145425990165691) from 129.215.197.39:6999: error:
> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> > -1347145425990165691
> >
> >
> > The underlying hdfs seems fine - fsck reports the hbase directory as
> > healthy. After a restart hbase seems fine too, but surely the
> > regionservers should stay up once they're started,
> >
> > Any suggestions?
> >
> > regards
> > Barry



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Mime
View raw message