hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase 0.90.0 region servers dying
Date Tue, 22 Feb 2011 20:34:25 GMT
Ted asked about the JVM version but I don't think you answered that.
In any case, try with u17.

J-D

On Sat, Feb 19, 2011 at 3:58 AM, Enis Soztutar <enis.soz.nutch@gmail.com> wrote:
> Yes indeed but no luck.
>
> Enis
>
> On Fri, Feb 18, 2011 at 11:50 AM, Jean-Daniel Cryans <jdcryans@apache.org>
> wrote:
>>
>> Just to make sure, you did check in the .out file after a failure right?
>>
>> J-D
>>
>> On Thu, Feb 17, 2011 at 10:14 PM, Enis Soztutar
>> <enis.soz.nutch@gmail.com> wrote:
>> > Hi,
>> >
>> > Thanks everyone for the answers.
>> > I had already  increase the file descriptors to 32768. The region
>> > servers
>> > and the zookeeper processes are dying, but datanode and tasktrackers
>> > keep
>> > running (they are configured with a max heap of 1Gb). The logs do not
>> > contain any indication that something is going wrong. The last info on
>> > the
>> > logs are typical INFO level logs.  I have also checked for kernel logs,
>> > but
>> > kernel does not report that it is killing the processes either. While
>> > testing, two of the servers restarted at different times, which was the
>> > original reason that I had suspected a memory error. But after we
>> > replaced
>> > the power supplies, nodes did not restart, but the processes kept dying.
>> >
>> > For the load, the ycsb test for 10M records goes on for a while at 4K
>> > inserts per sec, but cannot complete due to region servers dying one by
>> > one.
>> > iostat also shows light cpu and io utilization around 20%. Any more
>> > suggestions for debugging would be more than welcome.
>> >
>> > Thanks,
>> > Enis
>> >
>> > On Wed, Feb 16, 2011 at 5:13 AM, Eric <eric.xkcd@gmail.com> wrote:
>> >
>> >> Did you increase the max open files on your system (in
>> >> /etc/security/limits.conf) ?
>> >>
>> >
>> >> 2011/2/16 Enis Soztutar <enis.soz.nutch@gmail.com>
>> >>
>> >> > Hi,
>> >> >
>> >> > We have a newly setup a cluster of 5 nodes, each with 16 GB rams. We
>> >> > use
>> >> > HBase 0.90.0 on top of Hadoop from CDH3. When testing HBase under
>> >> > heavy
>> >> > load
>> >> > generated bu YCSB, we consistently see region servers dying silently,
>> >> > without any logs or exceptions (not even in system logs). We couldn't
>> >> track
>> >> > down the problem, so we have  tested the same setup on a rackspace
>> >>  cluster
>> >> > with 7 nodes but similar hardware, and we didn't have any problem.
>> >> >
>> >> > We are suspecting a problem with the rams, or motherboards, but all
>> >> memory
>> >> > tests run successfully. I was wondering if anyone had similar
>> >> > problems
>> >> > before and is there anything you suggest to nail down the issue.
>> >> >
>> >> > Thanks,
>> >> > Enis
>> >> >
>> >>
>> >
>
>

Mime
View raw message