Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 89465 invoked from network); 18 May 2010 22:13:29 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 18 May 2010 22:13:29 -0000 Received: (qmail 20617 invoked by uid 500); 18 May 2010 22:13:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 20590 invoked by uid 500); 18 May 2010 22:13:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 20582 invoked by uid 500); 18 May 2010 22:13:29 -0000 Delivered-To: apmail-hadoop-hbase-user@hadoop.apache.org Received: (qmail 20579 invoked by uid 99); 18 May 2010 22:13:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 22:13:29 +0000 X-ASF-Spam-Status: No, hits=-0.6 required=10.0 tests=AWL,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jgray@facebook.com designates 69.63.179.25 as permitted sender) Received: from [69.63.179.25] (HELO mailout-sf2p.facebook.com) (69.63.179.25) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 May 2010 22:13:22 +0000 Received: from mail.thefacebook.com ([192.168.18.105]) by pp02.snc1.tfbnw.net (8.14.3/8.14.3) with ESMTP id o4IMCMVZ017535 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Tue, 18 May 2010 15:12:24 -0700 Received: from SC-MBXC1.TheFacebook.com ([192.168.18.102]) by sc-hub02.TheFacebook.com ([192.168.18.105]) with mapi; Tue, 18 May 2010 15:12:09 -0700 From: Jonathan Gray To: "hbase-user@hadoop.apache.org" Date: Tue, 18 May 2010 15:12:07 -0700 Subject: RE: Optimal block size for large columns Thread-Topic: Optimal block size for large columns Thread-Index: Acr2p+hnlmC+7/7MSZybq6h5A/XUfAALxzUQ Message-ID: <8D66B74984F9564BBB25C3C67D630F2D68457752@SC-MBXC1.TheFacebook.com> References: <7647D1CF-0973-4D15-A140-E5E59D39C749@cumuluscode.com> In-Reply-To: <7647D1CF-0973-4D15-A140-E5E59D39C749@cumuluscode.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Proofpoint-Virus-Version: vendor=fsecure engine=1.12.8161:2.4.5,1.2.40,4.0.166 definitions=2010-05-18_04:2010-02-06,2010-05-18,2010-05-18 signatures=0 It would depend on your read patterns. Is everything going to be single row gets, or will you also scan? Single row lookups will be faster with smaller block sizes, at the expense = of a larger index size (and potentially slower scans as you have to deal wi= th more block fetches). > -----Original Message----- > From: Jason Strutz [mailto:jason@cumuluscode.com] > Sent: Tuesday, May 18, 2010 9:33 AM > To: hbase-user@hadoop.apache.org > Subject: Optimal block size for large columns >=20 > I am working with a small cluster, trying to nail down appropriate > settings for block size. We will have a single table with a single > column of data averaging 300k in size, sometimes upwards of 2mb, never > more than 10mb. >=20 > Is there any rule-of-thumb or other sage advice for block sizes for > large columns? >=20 > Thanks!