hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vladimir Rodionov <vrodio...@carrieriq.com>
Subject RE: Is this a long GC pause, or something else?
Date Tue, 10 Jun 2014 18:49:01 GMT
1. Do you have GC logging enabled on your cluster? It does not look like GC - pause to me but
for future troubleshooting it is better
to enable GC logging.

2. How large is your cluster? Did you check NN and DN logs as well? Are all your nodes (RS
and DN) up and running? No dead nodes?

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Tom Brown [tombrown52@gmail.com]
Sent: Tuesday, June 10, 2014 11:13 AM
To: user@hbase.apache.org
Subject: Re: Is this a long GC pause, or something else?

We are still using 0.94.10. We are looking at upgrading soon, but have not
done so yet.

--Tom


On Tue, Jun 10, 2014 at 12:10 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Which release are you using ?
>
> In 0.98+, there is JvmPauseMonitor.
>
> Cheers
>
>
> On Tue, Jun 10, 2014 at 11:05 AM, Tom Brown <tombrown52@gmail.com> wrote:
>
> > Last night a regionserver in my cluster stopped responding in a timely
> > manner for about 20 minutes. I know that stop-the-world GC can cause this
> > type of behavior, but 20 minutes seems excessive.
> >
> > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). We
> > are using the latest java 7 from oracle. HDFS is provided by an Isilon
> > cluster.
> >
> > The server workload is read/write: the writing process reads all rows it
> is
> > about to write, updates them if they exist, and then writes all the rows
> > (replacing ones that were updated).
> >
> > The last messages before the pause were regarding an HLog roll:
> >
> > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll requested
> > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > getDefaultReplication
> > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support
> > getDefaultBlockSize
> >
> > During the next 20 minutes there were a handful of sporadic LruBlockCache
> > stats messages but nothing else. After 20 minutes, normal operation
> > resumed.
> >
> > Is 20 minutes for a GC pause expected given the operational load and
> > machine specs? Could a GC pause include periodic log messages? If it
> wasn't
> > a GC pause, what else could it be?
> >
> > --Tom
> >
>

Confidentiality Notice:  The information contained in this message, including any attachments
hereto, may be confidential and is intended to be read only by the individual or entity to
whom this message is addressed. If the reader of this message is not the intended recipient
or an agent or designee of the intended recipient, please note that any review, use, disclosure
or distribution of this message or its attachments, in any form, is strictly prohibited. 
If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com
and delete or destroy any copy of this message and its attachments.

Mime
View raw message