Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1E0109F69 for ; Thu, 16 May 2013 09:17:42 +0000 (UTC) Received: (qmail 64377 invoked by uid 500); 16 May 2013 09:17:40 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 63694 invoked by uid 500); 16 May 2013 09:17:34 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 63369 invoked by uid 99); 16 May 2013 09:17:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 09:17:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of viral.bajaria@gmail.com designates 209.85.219.42 as permitted sender) Received: from [209.85.219.42] (HELO mail-oa0-f42.google.com) (209.85.219.42) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 May 2013 09:17:17 +0000 Received: by mail-oa0-f42.google.com with SMTP id i10so3482872oag.29 for ; Thu, 16 May 2013 02:16:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:date:message-id:subject:from:to :content-type; bh=brPieDbK2NaHvQ4gLZjsbwVdU6mjSp5XZO3Jmennnh0=; b=jBUOotX4+UTXqd5MlQh3eKhTpTkqHAGJqJOsJ27EWo6u70ODiWYwfu8I86XeEOot/E h0eDUu41e9PNTkBvbJSxEwuQaivMesGBuWdI35NPuKEO/1D2cfpbnKyuzrpyzasULo8l NPLohGkbh59XQYB8n5TSTCJHNE8BHZgQmeWpHt1IUyVX3xEsq34AgWxQW53Yo63McNZ0 TiW1MOyrxqPpQj7LcuEMRLG7pScL5DPttmqEguiZxHHGH+JYRjzVxDPt6YHiPQLivkAF 0BLq1mVjULcagJmv4eSUcKsfdnPsdurDU3tgAgHwOC8NCldTvOOHRDp/IHa0IJ44GD3k 9tag== MIME-Version: 1.0 X-Received: by 10.60.34.135 with SMTP id z7mr21685336oei.68.1368695816648; Thu, 16 May 2013 02:16:56 -0700 (PDT) Received: by 10.182.76.74 with HTTP; Thu, 16 May 2013 02:16:56 -0700 (PDT) Date: Thu, 16 May 2013 02:16:56 -0700 Message-ID: Subject: GET performance degrades over time From: Viral Bajaria To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0122acb4f500c304dcd2538d X-Virus-Checked: Checked by ClamAV on apache.org --089e0122acb4f500c304dcd2538d Content-Type: text/plain; charset=ISO-8859-1 Hi, My setup is as follows: 24 regionservers (7GB RAM, 8-core CPU, 5GB heap space) hbase 0.94.4 5-7 regions per regionserver I am doing an avg of 4k-5k random gets per regionserver per second and the performance is acceptable in the beginning. I have also done ~10K gets for a single regionserver and got the results back in 600-800ms. After a while the performance of the GETs starts degrading. The same ~10K random gets start taking upwards of 9s-10s. With regards to hbase settings that I have modified, I have disabled major compaction, increase region size to 100G and bumped up the handler count to 100. I monitored ganglia for metrics that vary when the performance shifts from good to bad and found that the fsPreadLatency_avg_time is almost 25x in the bad performing regionserver. fsReadLatency_avg_time is also slightly higher but not that much (it's around 2x). I took a thread dump of the regionserver process and also did CPU utilization monitoring. The CPU cycles were being spent on org.apache.hadoop.hdfs.BlockReaderLocal.read and stack trace for threads running that function is below this email. Any pointers on why positional reads degrade over time ? Or is this just an issue of disk I/O and I should start looking into that ? Thanks, Viral ====stacktrace for one of the handler doing blockread==== "IPC Server handler 98 on 60020" - Thread t@147 java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:220) at org.apache.hadoop.hdfs.BlockReaderLocal.read(BlockReaderLocal.java:324) - locked <3215ed96> (a org.apache.hadoop.hdfs.BlockReaderLocal) at org.apache.hadoop.fs.FSInputChecker.readFully(FSInputChecker.java:384) at org.apache.hadoop.hdfs.DFSClient$BlockReader.readAll(DFSClient.java:1763) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.fetchBlockByteRange(DFSClient.java:2333) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2400) at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:46) at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1363) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockDataInternal(HFileBlock.java:1799) at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderV2.readBlockData(HFileBlock.java:1643) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:338) at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:254) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:480) at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:501) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:226) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:145) at org.apache.hadoop.hbase.regionserver.StoreFileScanner.enforceSeek(StoreFileScanner.java:351) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:354) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.requestSeek(KeyValueHeap.java:277) at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:543) - locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:411) - locked <3da12c8a> (a org.apache.hadoop.hbase.regionserver.StoreScanner) at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:143) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3643) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3578) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3561) - locked <74d81ea7> (a org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl) at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3599) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4407) at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4380) at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2039) at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426) --089e0122acb4f500c304dcd2538d--