Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of jgray@facebook.com designates
 69.63.179.25 as permitted sender)
From: Jonathan Gray <jgray@facebook.com>
To: "hbase-user@hadoop.apache.org" <hbase-user@hadoop.apache.org>
Date: Tue, 18 May 2010 15:12:07 -0700
Subject: RE: Optimal block size for large columns
Thread-Topic: Optimal block size for large columns
Thread-Index: Acr2p+hnlmC+7/7MSZybq6h5A/XUfAALxzUQ
Message-ID: 
 <8D66B74984F9564BBB25C3C67D630F2D68457752@SC-MBXC1.TheFacebook.com>
References: <7647D1CF-0973-4D15-A140-E5E59D39C749@cumuluscode.com>
In-Reply-To: <7647D1CF-0973-4D15-A140-E5E59D39C749@cumuluscode.com>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

It would depend on your read patterns.

Is everything going to be single row gets, or will you also scan?

Single row lookups will be faster with smaller block sizes, at the expense =
of a larger index size (and potentially slower scans as you have to deal wi=
th more block fetches).

> -----Original Message-----
> From: Jason Strutz [mailto:jason@cumuluscode.com]
> Sent: Tuesday, May 18, 2010 9:33 AM
> To: hbase-user@hadoop.apache.org
> Subject: Optimal block size for large columns
>=20
> I am working with a small cluster, trying to nail down appropriate
> settings for block size.  We will have a single table with a single
> column of data averaging 300k in size, sometimes upwards of 2mb, never
> more than 10mb.
>=20
> Is there any rule-of-thumb or other sage advice for block sizes for
> large columns?
>=20
> Thanks!