Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79B2710575 for ; Sat, 1 Feb 2014 06:32:56 +0000 (UTC) Received: (qmail 3829 invoked by uid 500); 1 Feb 2014 06:32:53 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 3138 invoked by uid 500); 1 Feb 2014 06:32:51 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 3124 invoked by uid 99); 1 Feb 2014 06:32:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Feb 2014 06:32:49 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,SPF_SOFTFAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of leipzig3@gmail.com does not designate 216.139.250.139 as permitted sender) Received: from [216.139.250.139] (HELO joe.nabble.com) (216.139.250.139) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Feb 2014 06:32:44 +0000 Received: from ben.nabble.com ([192.168.236.152]) by joe.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1W9U7S-0004sv-Sl for user@hbase.apache.org; Fri, 31 Jan 2014 22:31:58 -0800 Date: Fri, 31 Jan 2014 22:31:43 -0800 (PST) From: Jan Schellenberger To: user@hbase.apache.org Message-ID: <1391236303870-4055564.post@n3.nabble.com> In-Reply-To: References: <1391209929367-4055545.post@n3.nabble.com> <1391232345.78878.YahooMailNeo@web140603.mail.bf1.yahoo.com> Subject: RE: Slow Get Performance (or how many disk I/O does it take for one non-cached read?) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org A lot of useful information here... I disabled bloom filters I changed to gz compression (compressed files significantly) I'm now seeing about *80gets/sec/server* which is a pretty good improvement. Since I estimate that the server is capable of about 300-350 hard disk operations/second, that's about 4 hard disk operations/get. I will experiment with the BLOCKSIZE next. Unfortunately upgrading our system to a newer HBASE/Hadoop is tricky for various IT/regulation reasons but I'll ask to upgrade. From what I see, even Cloudera 4.5.0 still comes with HBase 94.6 I also restarted the regionservers and am now getting blockCacheHitCachingRatio=51% and blockCacheHitRatio=51%. So conceivably, I could be hitting the: root index (cache hit) block index (cache hit) load on average 2 blocks to get data (cache misses most likely as my total heap space is 1/7 the compressed dataset) That would be about 52% cache hit overall and if each data access requires 2 Hard Drive reads (data + checksum) then that would explain my throughput. It still seems high but probably within the realm of reason. Does HBase always read a full block (the 64k HFile block, not the HDFS block) at a time or can it just jump to a particular location within the block? -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Slow-Get-Performance-or-how-many-disk-I-O-does-it-take-for-one-non-cached-read-tp4055545p4055564.html Sent from the HBase User mailing list archive at Nabble.com.