Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: Vladimir Rodionov <vrodionov@carrieriq.com>
To: "dev@hbase.apache.org" <dev@hbase.apache.org>
Date: Wed, 26 Oct 2011 14:50:37 -0700
Subject: RE: Random I/O performance
Thread-Topic: Random I/O performance
Thread-Index: AcyUIMmc0Q3bqZmnT+a5Kkzv5lYTmwABeEez
Message-ID: 
 <DC5EBE7F3610EB4CA5C7E92D76873E15170BE3E35F@exchange2007.carrieriq.com>
References: 
 <DC5EBE7F3610EB4CA5C7E92D76873E15170BE3E35B@exchange2007.carrieriq.com>
	<CADcMMgHCcNALQKfB85YnVBPaCTd1zNXq3rKQkbVL1=Qx5QOWHg@mail.gmail.com>
	<DC5EBE7F3610EB4CA5C7E92D76873E15170BE3E35C@exchange2007.carrieriq.com>,<CADcMMgErsVnTgbarSa49rAAKM+2-3DuQDkM-PokkhaZvGzUh5A@mail.gmail.com>
In-Reply-To: 
 <CADcMMgErsVnTgbarSa49rAAKM+2-3DuQDkM-PokkhaZvGzUh5A@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0


>> Are you hitting cache at all?
>
> Its totally random, due to the proposed key design which favored fast ins=
erts. Keys are randomized
> values, that is why there is no data locality in row look ups. Effect of =
the cache (LruBlockCache?) is negligible
> in this case.
>

>>So a different schema would get cache into the mix?

You can/t change schema while system is in production


>>Its going to keep growing without bound?


No, we keep data for XX days than purge stale data from the table.


My question was: what else besides obvious -run all in parallel - can help =
to improve random I/O?=20

1. Will BLOOM filter help to optimize HBase Read path?
2. We use compression already.
3. Block size - does it really matter much?
4. Off heap block cache? Its in 92 trunk? Have anybody performed real perfo=
rmance tests on Off heap cache?

We could easily allocate 10-15 GB per node thus effectively caching hot dat=
a in other tables (not in the fact table)

Off heap cache. What is max size of off heap cache we could try?
 My major concerns are:=20

- memory allocators are pretty hard to debug and get them working right.
- memory fragmentation?=20
- It still relies on on- heap Java data structures to perform eviction- whi=
ch can degrade performance in case of a large caches.