Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 298CF11ED6 for ; Wed, 11 Jun 2014 04:11:04 +0000 (UTC) Received: (qmail 14211 invoked by uid 500); 11 Jun 2014 04:11:02 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 14153 invoked by uid 500); 11 Jun 2014 04:11:02 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 14142 invoked by uid 99); 11 Jun 2014 04:11:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 04:11:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of otis.gospodnetic@gmail.com designates 209.85.216.172 as permitted sender) Received: from [209.85.216.172] (HELO mail-qc0-f172.google.com) (209.85.216.172) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 11 Jun 2014 04:10:58 +0000 Received: by mail-qc0-f172.google.com with SMTP id o8so3699756qcw.31 for ; Tue, 10 Jun 2014 21:10:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=3b99nI6sj2HmjPMSaUb2NGe85pxTjwAIHg7B08N51TY=; b=tz3EJq381+pTcvMW06b01mh5pBfX+dEaMjB0dZwn4C5pmjrdwzsPAWcT+NXoAY5q26 iWapyxF1q/vg2akcFus40BCxjm2gL1q5SrbhMMfFKCKAlTJFC07wCmN5uWezByeY9Ez8 7y13OrL1+M6Ugwng0+jPAW5j2C78M8s1HETtEkErqbeFp84Y2TbpDA5hqgkFiWn/TiJV cwy9cKLa1mBvfsNh2dhA8eaGljIlLnVt24ntGPo2BeU9Cea7K+whwMOrPqe28mEBf24P c421S7uQlgihOuu3OSg+/txCa7qafQERm5uC2cbUars3NBrST2biRsIw8Xy1rucCCf3P ktwQ== MIME-Version: 1.0 X-Received: by 10.224.137.193 with SMTP id x1mr29315172qat.0.1402459837103; Tue, 10 Jun 2014 21:10:37 -0700 (PDT) Received: by 10.229.98.3 with HTTP; Tue, 10 Jun 2014 21:10:37 -0700 (PDT) In-Reply-To: References: <00B12C12-4ABE-49BA-BA82-B248D5A5A13E@gmail.com> Date: Wed, 11 Jun 2014 00:10:37 -0400 Message-ID: Subject: Re: Is this a long GC pause, or something else? From: Otis Gospodnetic To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=047d7bfe9ea667057c04fb87a015 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bfe9ea667057c04fb87a015 Content-Type: text/plain; charset=UTF-8 Hi Tom, Aha. Our pauses keep happening. :( We use SPM - see http://sematext.com/spm/ - it has support for HBase and Hadoop metrics, among other things. As a matter of fact, for troubleshooting an issue like this one you may also want to ship your logs into Logsene . Doing that will let you correlate your pause with messages in the logs, which could help you figure out what's going on next time something like this happens. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Tue, Jun 10, 2014 at 7:52 PM, Tom Brown wrote: > Otis, > > I'm not sure our issue is the same (although they could turn out to be > related). As far as I have been able to determine, we have only had a > single long pause. > > However, we don't have much experience micromanaging our JVMs. How did you > generate those graphs? > > --Tom > > > On Tue, Jun 10, 2014 at 4:52 PM, Otis Gospodnetic < > otis.gospodnetic@gmail.com> wrote: > > > No, I don't think so. We had it until this morning and didn't see this > > problem. We'll probably switch to it tomorrow morning before we change > EC2 > > instances and see if that removes the problem. > > > > Tom - do your pauses look like the ones in our SPM graphs? > > > > Otis > > -- > > Performance Monitoring * Log Analytics * Search Analytics > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > On Tue, Jun 10, 2014 at 6:38 PM, Vladimir Rodionov < > > vrodionov@carrieriq.com> > > wrote: > > > > > Unbelievable. Do you see the same with the latest OpenJDK? > > > > > > Best regards, > > > Vladimir Rodionov > > > Principal Platform Engineer > > > Carrier IQ, www.carrieriq.com > > > e-mail: vrodionov@carrieriq.com > > > > > > ________________________________________ > > > From: Otis Gospodnetic [otis.gospodnetic@gmail.com] > > > Sent: Tuesday, June 10, 2014 2:43 PM > > > To: user@hbase.apache.org > > > Subject: Re: Is this a long GC pause, or something else? > > > > > > Does it repeat? > > > We are seeing this with u60 oracle JVM too! SPM shows the whole JVM > > > blocking for about 16 minutes every M minutes. > > > > > > Otis > > > > > > > > > > > > > On Jun 10, 2014, at 2:05 PM, Tom Brown wrote: > > > > > > > > Last night a regionserver in my cluster stopped responding in a > timely > > > > manner for about 20 minutes. I know that stop-the-world GC can cause > > this > > > > type of behavior, but 20 minutes seems excessive. > > > > > > > > The server is a 2 core VM with 16GB of RAM, (hbase max heap is 12GB). > > We > > > > are using the latest java 7 from oracle. HDFS is provided by an > Isilon > > > > cluster. > > > > > > > > The server workload is read/write: the writing process reads all rows > > it > > > is > > > > about to write, updates them if they exist, and then writes all the > > rows > > > > (replacing ones that were updated). > > > > > > > > The last messages before the pause were regarding an HLog roll: > > > > > > > > DEBUG org.apache.hadoop.hbase.regionserver.LogRoller: HLog roll > > requested > > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > > getDefaultReplication > > > > INFO org.apache.hadoop.hbase.util.FSUtils: FileSystem doesn't support > > > > getDefaultBlockSize > > > > > > > > During the next 20 minutes there were a handful of sporadic > > LruBlockCache > > > > stats messages but nothing else. After 20 minutes, normal operation > > > resumed. > > > > > > > > Is 20 minutes for a GC pause expected given the operational load and > > > > machine specs? Could a GC pause include periodic log messages? If it > > > wasn't > > > > a GC pause, what else could it be? > > > > > > > > --Tom > > > > > > Confidentiality Notice: The information contained in this message, > > > including any attachments hereto, may be confidential and is intended > to > > be > > > read only by the individual or entity to whom this message is > addressed. > > If > > > the reader of this message is not the intended recipient or an agent or > > > designee of the intended recipient, please note that any review, use, > > > disclosure or distribution of this message or its attachments, in any > > form, > > > is strictly prohibited. If you have received this message in error, > > please > > > immediately notify the sender and/or Notifications@carrieriq.com and > > > delete or destroy any copy of this message and its attachments. > > > > > > --047d7bfe9ea667057c04fb87a015--