Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 31886173B1 for ; Thu, 6 Nov 2014 22:11:04 +0000 (UTC) Received: (qmail 84352 invoked by uid 500); 6 Nov 2014 22:11:02 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 84290 invoked by uid 500); 6 Nov 2014 22:11:01 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 84266 invoked by uid 99); 6 Nov 2014 22:11:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2014 22:11:01 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ndimiduk@gmail.com designates 209.85.160.173 as permitted sender) Received: from [209.85.160.173] (HELO mail-yk0-f173.google.com) (209.85.160.173) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Nov 2014 22:10:57 +0000 Received: by mail-yk0-f173.google.com with SMTP id 20so1733390yks.18 for ; Thu, 06 Nov 2014 14:09:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=d2D8qykbecDz9moqfEDruuXIYWc+Zd2T9GM2DtDRcvk=; b=daOy8uRJZvYJKtfMfTs/fePzcS5Mujt7iSI+6LzswopAoAxi1aV+K2CYCQxtBjXs+H R7mlzIMyTDZ19hmqk8E0v/jPUSBsHd/3HSmRwtSa5JCM/598DMyBKad9psfN3QoXtJB0 aNnG7gs74zITRvuNr08hV4hC5lhHjWzmAy3TouEPlZB2SWESC+Egv0el6F+nkbZlkGwb IFT9Wdn6k4JzEMmSsDXO+nJQvHrFTX5nqCfLF4iW9Dtj1Okc2E3WiU9n6RoZ6bTebNpE TVx/9X7ESLinGqoxC+PSadIEsIedLc4XHYUMkf8P9X/ymUXTozPSmz4ioyiKDm+l4HzB L05A== MIME-Version: 1.0 X-Received: by 10.236.42.165 with SMTP id j25mr6791824yhb.130.1415311791677; Thu, 06 Nov 2014 14:09:51 -0800 (PST) Received: by 10.170.197.83 with HTTP; Thu, 6 Nov 2014 14:09:51 -0800 (PST) In-Reply-To: References: <89A10F47-55E9-474C-BB20-206D73B4DF53@whisper.sh> <1FACBBCE-5030-4A24-97FA-69F14C3DD7AF@whisper.sh> <68932EEA-CC83-4910-ABA8-72A46E78C3F6@whisper.sh> Date: Thu, 6 Nov 2014 14:09:51 -0800 Message-ID: Subject: Re: Hbase Unusable after auto split to 1024 regions From: Nick Dimiduk To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c2009896eac4050737f417 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2009896eac4050737f417 Content-Type: text/plain; charset=UTF-8 Ive been doing some testing with ITMTTR recently in ec2 with m1.l and m1.xl instances. Debug level logging seems to produce 20-30 messages/sec on the RS. I have noticed pauses in the log entries that last anywhere from 30-120 seconds. I have no explanation for the pauses other than the environment. (Oddly, the JvmPauseMonitor thread is not detecting these events.) You may experience similar behavior. If you're planning production for ec2 with HBase, I do recommend the newer instance types. Basically: whatever they run Redshift on will be much more reliable than the stuff they offer to the commoners. -n On Thursday, November 6, 2014, Pere Kyle wrote: > Bryan, > > Thanks again for the incredibly useful reply. > > I have confirmed that the callQueueLen is in fact 0, with a max value of 2 > in the last week (in ganglia) > > hbase.hstore.compaction.max was set to 15 on the nodes, from a previous 7. > > Freezes (laggy responses) on the cluster are frequent and affect both > reads and writes. I noticed iowait on the nodes that spikes. > > The cluster goes between a state of working 100% to nothing > serving/timeouts for no discernible reason. > > Looking through the logs I have tons of responseTooSlow, this is the only > regular occurrence in the logs: > hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06 > 03:54:31,640 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 39 > on 60020): (responseTooSlow): > {"processingtimems":14573,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@c67b2ac), > rpc version=1, client version=29, methodsFingerPrint=-540141542","client":" > 10.231.139.198:57223 > ","starttimems":1415246057066,"queuetimems":20640,"class":"HRegionServer","responsesize":0,"method":"multi"} > hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06 > 03:54:31,640 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 42 > on 60020): (responseTooSlow): > {"processingtimems":45660,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@6c034090), > rpc version=1, client version=29, methodsFingerPrint=-540141542","client":" > 10.231.21.106:41126 > ","starttimems":1415246025979,"queuetimems":202,"class":"HRegionServer","responsesize":0,"method":"multi"} > hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06 > 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 46 > on 60020): (responseTooSlow): > {"processingtimems":14620,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@4fc3bb1f), > rpc version=1, client version=29, methodsFingerPrint=-540141542","client":" > 10.230.130.102:54068 > ","starttimems":1415246057021,"queuetimems":27565,"class":"HRegionServer","responsesize":0,"method":"multi"} > hbase-hadoop-regionserver-ip-10-230-130-121.us-west-2.compute.internal.log:2014-11-06 > 03:54:31,642 WARN org.apache.hadoop.ipc.HBaseServer (IPC Server handler 35 > on 60020): (responseTooSlow): > {"processingtimems":13431,"call":"multi(org.apache.hadoop.hbase.client.MultiAction@3b321922), > rpc version=1, client version=29, methodsFingerPrint=-540141542","client":" > 10.227.42.252:60493 > ","starttimems":1415246058210,"queuetimems":1134,"class":"HRegionServer","responsesize":0,"method":"multi"} > On Nov 6, 2014, at 12:38 PM, Bryan Beaudreault > wrote: > > > blockingStoreFiles > > --001a11c2009896eac4050737f417--