Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2DE63D2F1 for ; Thu, 18 Oct 2012 18:39:40 +0000 (UTC) Received: (qmail 72150 invoked by uid 500); 18 Oct 2012 18:39:38 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 72085 invoked by uid 500); 18 Oct 2012 18:39:38 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 72063 invoked by uid 99); 18 Oct 2012 18:39:37 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 18:39:37 +0000 X-ASF-Spam-Status: No, hits=-2.8 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_HI,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of apamecha@x.com designates 216.33.244.6 as permitted sender) Received: from [216.33.244.6] (HELO rhv-mipot-001.corp.ebay.com) (216.33.244.6) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 18:39:30 +0000 DomainKey-Signature: s=paypalcorp; d=x.com; c=simple; q=dns; h=X-EBay-Corp:X-IronPort-AV:Received:Received:From:To:CC: Subject:Thread-Topic:Thread-Index:Date:Message-ID: References:In-Reply-To:Accept-Language:Content-Language: X-MS-Has-Attach:X-MS-TNEF-Correlator:x-originating-ip: Content-Type:MIME-Version:X-CFilter-Loop; b=uizVSptoiUccqrL+K89amlJYzv/JQcUaBPdWQEOcFNVbvhYB81pQtdW8 1ABXwXX4oErQDxU1k3JRv+3Vqpye1gmFjqLM7o4sCJC/sOpNeIoqn1Sfj /xZkkT4asESAfd1ibyndDyvMK89/yA3nr35b7n0xHUYO+k/MPuJlMCrta w=; DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=x.com; i=apamecha@x.com; q=dns/txt; s=paypalcorp; t=1350585570; x=1382121570; h=from:to:cc:subject:date:message-id:references: in-reply-to:mime-version; bh=coPXtdU4wH+yWEE3jITY/1fgPUkDciHoSsqO1NiJ4Fk=; b=o7FrUpPMNWUpC5iH5c6LQLLsxyvD6RSGDV9hhsGvZRNmsYZbxna2EZLx urGqUfaocyVFpJtnFoAwGl/Y8B+vE4RAohzEGbPOrLjP9XRH2PCVCFLR2 p4ovzAM/ufaZ6utMRT9fiOc8DDe3U5l8tUsKVdV+F9gvLCNvHK6X6B7mn U=; X-EBay-Corp: Yes X-IronPort-AV: E=Sophos;i="4.80,606,1344236400"; d="scan'208,217";a="86798568" Received: from unknown (HELO RHV-EXMHT-003.corp.ebay.com) ([10.112.113.55]) by rhv-mipot-001.corp.ebay.com with ESMTP; 18 Oct 2012 11:39:08 -0700 Received: from RHV-EXRDA-S11.corp.ebay.com ([fe80::edc0:9413:d700:64f]) by RHV-EXMHT-003.corp.ebay.com ([fe80::814f:de0a:319f:653c%14]) with mapi id 14.02.0318.001; Thu, 18 Oct 2012 11:38:45 -0700 From: "Pamecha, Abhishek" To: "user@hbase.apache.org" CC: Ivan Brondino , =?iso-8859-1?Q?Ricardo_Vila=E7a?= Subject: RE: High IPC Latency Thread-Topic: High IPC Latency Thread-Index: AQHNrV5RbFcSh0sjLEqDH/oxV7PSoJe/ZIrg Date: Thu, 18 Oct 2012 18:38:44 +0000 Message-ID: <2E362ACC9493D747B488241C66B3B66520E519@RHV-EXRDA-S11.corp.ebay.com> References: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.245.27.243] Content-Type: multipart/alternative; boundary="_000_2E362ACC9493D747B488241C66B3B66520E519RHVEXRDAS11corpeb_" MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Virus-Checked: Checked by ClamAV on apache.org --_000_2E362ACC9493D747B488241C66B3B66520E519RHVEXRDAS11corpeb_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Is it sustained for the same client hitting the same region server OR does = it get better for the same client-RS combination when run for longer durati= on? Trying to eliminate Zookeeper from this. Thanks, Abhishek From: Yousuf Ahmad [mailto:myahmad0@gmail.com] Sent: Thursday, October 18, 2012 11:26 AM To: user@hbase.apache.org Cc: Ivan Brondino; Ricardo Vila=E7a Subject: High IPC Latency Hello, We are seeing slow times for read operations in our experiments. We are hop= ing that you guys can help us figure out what's going wrong. Here are some details: * We are running a read-only benchmark on our HBase cluster. * * There are 10 regionservers, each co-located with a datanode. HDFS rep= lication is 3x. * All the data read by the experiment is already in the block cache and= the hit ratio is 99%. * * We have 10 clients, each with around 400 threads making a mix of read= -only requests involving multi-gets and scans. * * We settled on the default client pool type/size (roundrobin/1) and a = regionserver handler count of 100 after testing various combinations to see= what setting worked best. * * Our scans are short, fetching around 10 rows on average. Scanner cach= ing is set to 50. * An average row in a scan has either around 10 columns (small row) or = around 200 columns (big row). * * Our multi-gets fetch around 200 rows on average. * An average row in a multi-get has around 10 columns. * Each column holds an integer (encoded into bytes). * * None of the machines involved reach CPU, memory, or IO saturation. In= fact resource utilization stays quite low. * * Our statistics show that the average time for a scan, measured starti= ng from the first scanner.next() call to the last one which returns a null,= is around 2-3 seconds. * Since we use scanner caching, the major portion of this time (around = 2 seconds) is spent on the first call to next(), while the remaining calls = take a negligible amount of time. * Similarly, we see that a multi-get on average takes around 2 seconds. * A single get on average takes around 1 second. We are not sure what the bottleneck is or where it lies. We thought we shou= ld look deeper into what is going on at the regionservers. We monitored the= IPC calls during one of the experiments. Here is a sample of one regionser= ver log: 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115483; Served: HRegionInterface#get queueTime=3D0 processingTime=3D1 con= tents=3D1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115487; Served: HRegionInterface#get queueTime=3D0 processingTime=3D0 con= tents=3D1 Get, 75 bytes 2012-10-18 17:00:09,969 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115489; Served: HRegionInterface#get queueTime=3D0 processingTime=3D0 con= tents=3D1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #111421; Served: HRegionInterface#get queueTime=3D0 processingTime=3D0 con= tents=3D1 Get, 75 bytes 2012-10-18 17:00:09,982 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115497; Served: HRegionInterface#multi queueTime=3D0 processingTime=3D9 c= ontents=3D200 Gets 2012-10-18 17:00:09,984 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115499; Served: HRegionInterface#openScanner queueTime=3D0 processingTime= =3D0 contents=3D1 Scan, 63 bytes 2012-10-18 17:00:09,990 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #115503; Served: HRegionInterface#get queueTime=3D0 processingTime=3D0 con= tents=3D1 Get, 75 bytes 2012-10-18 17:00:09,992 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #103230; Served: HRegionInterface#next queueTime=3D0 processingTime=3D0 co= ntents=3D1 Long, 1 Integer 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #103234; Served: HRegionInterface#close queueTime=3D0 processingTime=3D0 c= ontents=3D1 Long 2012-10-18 17:00:09,994 DEBUG org.apache.hadoop.ipc.HBaseServer.trace: Call= #103232; Served: HRegionInterface#next queueTime=3D0 processingTime=3D0 co= ntents=3D1 Long, 1 Integer I have attached a larger chunk of the logs we collected for this experiment= in case that helps. >From the logs, we saw that the next() operation at the regionserver takes 1= millisecond or less; and a multi-get takes 10 ms on average. Yet the corresponding times we see at the client are orders of magnitude hi= gher. Ping times between the machines are at most 1ms and we are not saturating t= he network. We would really appreciate some insights from you guys on this. Where do you suggest we focus our efforts in order to hunt down this bottle= neck/contention? Thanks! Yousuf --_000_2E362ACC9493D747B488241C66B3B66520E519RHVEXRDAS11corpeb_--