Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 46931113D9 for ; Wed, 6 Aug 2014 04:48:30 +0000 (UTC) Received: (qmail 34754 invoked by uid 500); 6 Aug 2014 04:48:28 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 34683 invoked by uid 500); 6 Aug 2014 04:48:28 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 34672 invoked by uid 99); 6 Aug 2014 04:48:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2014 04:48:28 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (athena.apache.org: transitioning domain of taeyun.kim@innowireless.co.kr does not designate 59.12.193.45 as permitted sender) Received: from [59.12.193.45] (HELO MAIL1.innowireless.co.kr) (59.12.193.45) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 Aug 2014 04:48:22 +0000 Received: from INNOC358 (218.154.28.88) by MAIL1.innowireless.co.kr (59.12.193.45) with Microsoft SMTP Server id 14.3.195.1; Wed, 6 Aug 2014 13:46:26 +0900 From: innowireless TaeYun Kim To: References: <000001cfb09d$df123460$9d369d20$@innowireless.co.kr> <000801cfb09f$47a09a70$d6e1cf50$@innowireless.co.kr> <000301cfb0a1$7a8910a0$6f9b31e0$@innowireless.co.kr> <000401cfb0a2$f8519380$e8f4ba80$@innowireless.co.kr> <001201cfb12d$de2a6a00$9a7f3e00$@innowireless.co.kr> In-Reply-To: Subject: RE: Question on the number of column families Date: Wed, 6 Aug 2014 13:48:21 +0900 Message-ID: <000301cfb131$a626d3b0$f2747b10$@innowireless.co.kr> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Outlook 14.0 Thread-Index: AQEbN4NyDX2CkP+/BEas7s2FdxKlgwFNsEd7AU/tEHUB4Yw20QIXcAh3AjKBEj0BdyQnhgDjwVgTAhbAXbsBRZlhPQICdzM/nKfbtvA= Content-Language: ko X-Originating-IP: [218.154.28.88] X-Virus-Checked: Checked by ClamAV on apache.org Thank you. The 'dummy' column will always hold the value '1' (or even an empty = string), that only signifies that this row exists. (And the real value = is in the other 'big' column family) The value is irrelevant since with current schema the filtering will be = done by rowkey components alone. No column value is needed. (I will = begin reading the filtering section shortly - it is only 6 pages ahead. = So sorry for my premature thoughts) -----Original Message----- From: Ted Yu [mailto:yuzhihong@gmail.com]=20 Sent: Wednesday, August 06, 2014 1:38 PM To: user@hbase.apache.org Subject: Re: Question on the number of column families bq. add a 'dummy' column family and apply HBASE-5416 technique Adding dummy column family is not the way to utilize essential column = family support - what would this dummy column family hold ? bq. since I have not read the filtering section of the book I'm reading = yet Once you finish reading, you can look at the unit test = (TestJoinedScanners) from HBASE-5416. You would understand this feature = better. Cheers On Tue, Aug 5, 2014 at 9:21 PM, innowireless TaeYun Kim < = taeyun.kim@innowireless.co.kr> wrote: > Thank you all. > > Facts learned: > > - Having 130 column families is too much. Don't do that. > - While scanning, an entire row will be read for filtering, unless > HBASE-5416 technique is applied which makes only relevant column=20 > family is loaded. (But it seems that still one can't load just a=20 > column needed while > scanning) > - Big row size is maybe not good. > > Currently it seems appropriate to follow the one-column solution that=20 > Alok Singh suggested, in part since currently there is no reasonable=20 > grouping of the fields. > > Here is my current thinking: > > - One column family, one column. Field name will be included in = rowkey. > - Eliminate filtering altogether (in most case) by properly ordering=20 > rowkey components. > - If a filtering is absolutely needed, add a 'dummy' column family and = > apply HBASE-5416 technique to minimize disk read, since the field=20 > value can be large(~5MB). (This dummy column thing may not be right,=20 > I'm not sure, since I have not read the filtering section of the book=20 > I'm reading yet) > > Hope that I am not missing or misunderstanding something... > (I'm a total newbie. I've started to read a HBase book since last=20 > week...) > > > > > >