hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Levin <magn...@gmail.com>
Subject Re: major hdfs issues
Date Sat, 12 Mar 2011 04:11:44 GMT
I am noticing following errors also:

2011-03-11 17:52:00,376 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(
10.103.7.3:50010, storageID=DS-824332190-10.103.7.3-50010-1290043658438,
infoPort=50075, ipcPort=50020):DataXceiveServer: Exiting due
to:java.lang.OutOfMemoryError: unable to create new native thread
        at java.lang.Thread.start0(Native Method)
        at java.lang.Thread.start(Thread.java:597)
        at
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:132)
        at java.lang.Thread.run(Thread.java:619)


and this:

nf_conntrack: table full, dropping packet.
nf_conntrack: table full, dropping packet.
nf_conntrack: table full, dropping packet.
nf_conntrack: table full, dropping packet.
nf_conntrack: table full, dropping packet.
nf_conntrack: table full, dropping packet.
net_ratelimit: 10 callbacks suppressed
nf_conntrack: table full, dropping packet.
possible SYN flooding on port 9090. Sending cookies.

This seems like a network stack issue?

So, does datanode need higher heap than 1GB?  Or possible we ran out of RAM
for other reasons?

-Jack

On Thu, Mar 10, 2011 at 1:29 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:

> Looks like a datanode went down.  InterruptedException is how java
> uses to interrupt IO in threads, its similar to the EINTR errno.  That
> means the actual source of the abort is higher up...
>
> So back to how InterruptedException works... at some point a thread in
> the JVM decides that the VM should abort.  So it calls
> thread.interrupt() on all the threads it knows/cares about to
> interrupt their IO.  That is what you are seeing in the logs. The root
> cause lies above I think.
>
> Look for the first "Exception" string or any FATAL or ERROR strings in
> the datanode logfiles.
>
> -ryan
>
> On Thu, Mar 10, 2011 at 1:03 PM, Jack Levin <magnito@gmail.com> wrote:
> > http://pastebin.com/ZmsyvcVc  Here is the regionserver log, they all
> have
> > similar stuff,
> >
> > On Thu, Mar 10, 2011 at 11:34 AM, Stack <stack@duboce.net> wrote:
> >
> >> Whats in the regionserver logs?  Please put up regionserver and
> >> datanode excerpts.
> >> Thanks Jack,
> >> St.Ack
> >>
> >> On Thu, Mar 10, 2011 at 10:31 AM, Jack Levin <magnito@gmail.com> wrote:
> >> > All was well, until this happen:
> >> >
> >> > http://pastebin.com/iM1niwrS
> >> >
> >> > and all regionservers went down, is this xciever issue?
> >> >
> >> > <property>
> >> > <name>dfs.datanode.max.xcievers</name>
> >> > <value>12047</value>
> >> > </property>
> >> >
> >> > this is what I have, should I set it higher?
> >> >
> >> > -Jack
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message