Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 828637A60 for ; Wed, 26 Oct 2011 21:51:07 +0000 (UTC) Received: (qmail 67801 invoked by uid 500); 26 Oct 2011 21:51:06 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 67777 invoked by uid 500); 26 Oct 2011 21:51:06 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 67769 invoked by uid 99); 26 Oct 2011 21:51:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 21:51:06 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [204.235.122.16] (HELO obmail.carrieriq.com) (204.235.122.16) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Oct 2011 21:50:59 +0000 From: Vladimir Rodionov To: "dev@hbase.apache.org" Date: Wed, 26 Oct 2011 14:50:37 -0700 Subject: RE: Random I/O performance Thread-Topic: Random I/O performance Thread-Index: AcyUIMmc0Q3bqZmnT+a5Kkzv5lYTmwABeEez Message-ID: References: , In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US x-kse-antivirus-interceptor-info: protection disabled Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org >> Are you hitting cache at all? > > Its totally random, due to the proposed key design which favored fast ins= erts. Keys are randomized > values, that is why there is no data locality in row look ups. Effect of = the cache (LruBlockCache?) is negligible > in this case. > >>So a different schema would get cache into the mix? You can/t change schema while system is in production >>Its going to keep growing without bound? No, we keep data for XX days than purge stale data from the table. My question was: what else besides obvious -run all in parallel - can help = to improve random I/O?=20 1. Will BLOOM filter help to optimize HBase Read path? 2. We use compression already. 3. Block size - does it really matter much? 4. Off heap block cache? Its in 92 trunk? Have anybody performed real perfo= rmance tests on Off heap cache? We could easily allocate 10-15 GB per node thus effectively caching hot dat= a in other tables (not in the fact table) Off heap cache. What is max size of off heap cache we could try? My major concerns are:=20 - memory allocators are pretty hard to debug and get them working right. - memory fragmentation?=20 - It still relies on on- heap Java data structures to perform eviction- whi= ch can degrade performance in case of a large caches.