Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 108A3E485 for ; Fri, 8 Feb 2013 04:35:02 +0000 (UTC) Received: (qmail 8754 invoked by uid 500); 8 Feb 2013 04:35:00 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 8417 invoked by uid 500); 8 Feb 2013 04:34:59 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 8381 invoked by uid 99); 8 Feb 2013 04:34:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2013 04:34:58 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 209.85.214.180 is neither permitted nor denied by domain of mellery@opendns.com) Received: from [209.85.214.180] (HELO mail-ob0-f180.google.com) (209.85.214.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 08 Feb 2013 04:34:50 +0000 Received: by mail-ob0-f180.google.com with SMTP id ef5so3438283obb.25 for ; Thu, 07 Feb 2013 20:34:29 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer :x-gm-message-state; bh=Y52puNTMfEhqBJjwWiQGb8Czyxqc65hCcvl6uTdk2Vw=; b=Sm1i3VcRFq4zbuxohJJDD4MR5F0/WDnGIVT4i7PhyH4s6i14NAFiBXQPtFokJ3rzGW HWLnEMOeyMVKGJ24w25HNzENCja0DlXzfPlyKt2PGE1VnnKHcqhS/6z5d7KImC590jc0 x5SYAGbzR2capgeGpCspambckRWRX7V3o4MyXyjtUf5a4fgXmB9JbPp5MoKlERlralsD R8K+GPBKs66ySEAloHnALyl21ej2dAZCCxvNxcHZUSuRW8uipyQYuvE7Wv07YHbGk+MC BU/+Az/ZuJEFZONSdKBMAkiLYrM9BY6jXcqKcRQkPxi60XydBQrEBWLkY+P9d32w/psx 79Pw== X-Received: by 10.60.24.162 with SMTP id v2mr3124005oef.96.1360298069561; Thu, 07 Feb 2013 20:34:29 -0800 (PST) Received: from [192.168.1.68] (99-47-68-14.lightspeed.sntcca.sbcglobal.net. [99.47.68.14]) by mx.google.com with ESMTPS id w10sm37426313oeg.2.2013.02.07.20.34.27 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 07 Feb 2013 20:34:28 -0800 (PST) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Apple Message framework v1283) Subject: Re: column count guidelines From: Michael Ellery In-Reply-To: Date: Thu, 7 Feb 2013 20:34:26 -0800 Content-Transfer-Encoding: quoted-printable Message-Id: References: <58F7FFE4-A3CE-4069-8FAD-32067CCA80B9@opendns.com> <0E011DB5-3BCF-475B-8585-FB8C3ED3122E@opendns.com> To: user@hbase.apache.org X-Mailer: Apple Mail (2.1283) X-Gm-Message-State: ALoCoQnE+sj5qPz1JRcClPdNdJHbT/pBxVzxbmpd3O1/hswMDgYELVv6BxQHSDfWM2aUu2Rg6Jd0 X-Virus-Checked: Checked by ClamAV on apache.org thanks for reminding me of the HBASE version in CDH4 - that's something = we'll definitely take into consideration. -Mike On Feb 7, 2013, at 5:09 PM, Ted Yu wrote: > Thanks Michael for this information. >=20 > FYI CDH4 (as of now) is based on HBase 0.92.x which doesn't have the = two > features I cited below. >=20 > On Thu, Feb 7, 2013 at 5:02 PM, Michael Ellery = wrote: >=20 >> There is only one CF in this schema. >>=20 >> Yes, we are looking at upgrading to CDH4, but it is not trivial since = we >> cannot have cluster downtime. Our current upgrade plans involves = additional >> hardware with side-by side clusters until everything is = exported/imported. >>=20 >> Thanks, >> Mike >>=20 >> On Feb 7, 2013, at 4:34 PM, Ted Yu wrote: >>=20 >>> How many column families are involved ? >>>=20 >>> Have you considered upgrading to 0.94.4 where you would be able to >> benefit >>> from lazy seek, Data Block Encoding, etc ? >>>=20 >>> Thanks >>>=20 >>> On Thu, Feb 7, 2013 at 3:47 PM, Michael Ellery >> wrote: >>>=20 >>>> I'm looking for some advice about per row CQ (column qualifier) = count >>>> guidelines. Our current schema design means we have a HIGHLY = variable CQ >>>> count per row -- some rows have one or two CQs and some rows have >> upwards >>>> of 1 million. Each CQ is on the order of 100 bytes (for round = numbers) >> and >>>> the cell values are null. We see highly variable and too often >>>> unacceptable read performance using this schema. I don't know for = a >> fact >>>> that the CQ count variability is the source of our problems, but I = am >>>> suspicious. >>>>=20 >>>> I'm curious about others' experience with CQ counts per row -- are = there >>>> some best practices/guidelines about how to optimally size the = number of >>>> CQs per row. The other obvious solution will involve breaking this = data >>>> into finer grained rows, which means shifting from GETs to SCANs - = are >>>> there performance trade-offs in such a change? >>>>=20 >>>> We are currently using CDH3u4, if that is relevant. All of our = loading >> is >>>> done via HFILE loading (bulk), so we have not had to tune write >> performance >>>> beyond using bulk loads. Any advice appreciated, including what = metrics >> we >>>> should be looking at to further diagnose our read performance >> challenges. >>>>=20 >>>> Thanks, >>>> Mike Ellery >>=20 >>=20