hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Enis Soztutar <enis.soz.nu...@gmail.com>
Subject Re: HBase 0.90.0 region servers dying
Date Sat, 19 Feb 2011 08:58:06 GMT
Yes indeed but no luck.

Enis

On Fri, Feb 18, 2011 at 11:50 AM, Jean-Daniel Cryans <jdcryans@apache.org>wrote:

> Just to make sure, you did check in the .out file after a failure right?
>
> J-D
>
> On Thu, Feb 17, 2011 at 10:14 PM, Enis Soztutar
> <enis.soz.nutch@gmail.com> wrote:
> > Hi,
> >
> > Thanks everyone for the answers.
> > I had already  increase the file descriptors to 32768. The region servers
> > and the zookeeper processes are dying, but datanode and tasktrackers keep
> > running (they are configured with a max heap of 1Gb). The logs do not
> > contain any indication that something is going wrong. The last info on
> the
> > logs are typical INFO level logs.  I have also checked for kernel logs,
> but
> > kernel does not report that it is killing the processes either. While
> > testing, two of the servers restarted at different times, which was the
> > original reason that I had suspected a memory error. But after we
> replaced
> > the power supplies, nodes did not restart, but the processes kept dying.
> >
> > For the load, the ycsb test for 10M records goes on for a while at 4K
> > inserts per sec, but cannot complete due to region servers dying one by
> one.
> > iostat also shows light cpu and io utilization around 20%. Any more
> > suggestions for debugging would be more than welcome.
> >
> > Thanks,
> > Enis
> >
> > On Wed, Feb 16, 2011 at 5:13 AM, Eric <eric.xkcd@gmail.com> wrote:
> >
> >> Did you increase the max open files on your system (in
> >> /etc/security/limits.conf) ?
> >>
> >
> >> 2011/2/16 Enis Soztutar <enis.soz.nutch@gmail.com>
> >>
> >> > Hi,
> >> >
> >> > We have a newly setup a cluster of 5 nodes, each with 16 GB rams. We
> use
> >> > HBase 0.90.0 on top of Hadoop from CDH3. When testing HBase under
> heavy
> >> > load
> >> > generated bu YCSB, we consistently see region servers dying silently,
> >> > without any logs or exceptions (not even in system logs). We couldn't
> >> track
> >> > down the problem, so we have  tested the same setup on a rackspace
> >>  cluster
> >> > with 7 nodes but similar hardware, and we didn't have any problem.
> >> >
> >> > We are suspecting a problem with the rams, or motherboards, but all
> >> memory
> >> > tests run successfully. I was wondering if anyone had similar problems
> >> > before and is there anything you suggest to nail down the issue.
> >> >
> >> > Thanks,
> >> > Enis
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message