hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne <wav...@gmail.com>
Subject Re: Node Shutdown Problems
Date Fri, 07 Jan 2011 16:45:11 GMT
Thanks for the reply. We are running hadoop branch-0.20-append. We have
upped our xceivers to 4096 and restarted the hadoop cluster. We had upped
them before but had yet to restart. Hopefully again this is our mistake not
setting up correctly from the start. So far so good.

Thanks for your help.


On Fri, Jan 7, 2011 at 1:09 AM, Stack <stack@duboce.net> wrote:

> On Thu, Jan 6, 2011 at 5:46 AM, Wayne <wav100@gmail.com> wrote:
> > I had another node go down last night. No load at the time, it just seems
> it
> > had issues and shut itself down. Any help would be greatly appreciated.
> Why
> > would the file system go away?
>
> Incorrect ulimit or xceivers or it was shutdown from under hbase.
>
>
> > Here is a sampling of the logs. Please let me know what other information
> is
> > needed to help figure this out.
> >
> > This is the first sign of trouble.
> >
> > 07:47:51,079 WARN org.apache.hadoop.hdfs.DFSClient: DFSOutputStream
> > ResponseProcessor exception  for block
> > blk_-5666628798929448999_724103java.net.SocketTimeoutException: 69000
> > millis timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel
> >
> > 07:47:51,080 WARN org.apache.hadoop.hdfs.DFSClient: Error Recovery for
> > block blk_-5666628798929448999_724103 bad datanode[0]
> >
> > It fails 6 recovery attempts each on 3 nodes including itself for the
> same
> > block
> >
>
> Check datanode logs, in particular those participating in this blocks
> replication.  They may tell you something.  Grepping the block in the
> namenode log can also be revealing.
>
>
>
>
> > 2011-01-06 07:48:42,075 ERROR
> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> > java.io.IOException: Error Recovery for block
> > blk_-5666628798929448999_724103 failed  because recovery from primary
> > datanode x.x.x.x8:50010 failed 6 times.  Pipeline was x.x.x.x8:50010.
> > Aborting...
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2741)
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1500(DFSClient.java:2172)
> >        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2371)
> > 2011-01-06 07:48:42,077 FATAL
> > org.apache.hadoop.hbase.regionserver.wal.HLog: Could not append.
> > Requesting close of hlog
> >
>
>
> Thats as though you were running with one replica only, or the other
> replicas had died on the dfsclient and had been marked bad.
>
> The regionserver will shut itself down if it can't write the
> filesystem to protect against data loss.
>
> >
> > Finally it gives up and we see the message below.
> >
> > 2011-01-06 07:48:42,233 FATAL
> > org.apache.hadoop.hbase.regionserver.HRegionServer: Aborting region
> > server serverName=x.x.x.x7,60020,1294257428578, load=(requests=76,
> > regions=320, usedHeap=3511, maxHeap=7987): File System not available
> > java.io.IOException: File system is not available
> >
>
> Yeah, as far as its concerned, the FS is hosed.
>
>
> What Hadoop are you running on top of (Sorry, you've probably already
> said in earlier emails).
>
>
> > I am not sure if it is related but I see java.net.SocketTimeoutException
> > happening a lot on most of the nodes in the hadoop log (not at the same
> time
> > as the node going down). I see some recent activity related to this, but
> I
> > am not sure if it has anything to do with our problems. I also see errors
> > like the one below, may be totally unrelated but as I look closely at the
> > logs I see these things for the first time.
> >
> > 05:45:33,059 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode:
> > DatanodeRegistration(x.x.x.x8:50010,
> > storageID=DS-201495735-x.x.x.x8-50010-1293567644535, infoPort=50075,
> > ipcPort=50020):DataXceiver
> > java.io.IOException: Block blk_3051235367015871067_707839 is not valid.
> >
>
> If you grep this log in namenode log, does it tell you anything?
>
> Thanks Wayne,
> St.Ack
>
>
> >
> > Thanks in advance for any assistance anyone can provide.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message