Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 95A5E110CE for ; Thu, 7 Aug 2014 02:43:36 +0000 (UTC) Received: (qmail 89396 invoked by uid 500); 7 Aug 2014 02:43:34 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 89326 invoked by uid 500); 7 Aug 2014 02:43:34 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 89314 invoked by uid 99); 7 Aug 2014 02:43:34 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 02:43:34 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tianq01@gmail.com designates 209.85.192.53 as permitted sender) Received: from [209.85.192.53] (HELO mail-qg0-f53.google.com) (209.85.192.53) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Aug 2014 02:43:31 +0000 Received: by mail-qg0-f53.google.com with SMTP id q107so3692428qgd.26 for ; Wed, 06 Aug 2014 19:43:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=fCOvrOsHITZfCAK6jV3h/pU+7czZzsw7g9kxoOodzE0=; b=zSSUjzhGGn7wmy69Bo6A/+23lIhQTWx1Kwus5A2pIUnUozjB988kCOkK0/+wuSiiLx IBwVQXkvQ6zNJzLDwIiFJ1LxsetKE2BuXyAFwmsJjDGROcTm/qKJ/kxM25PdVnjNVFpN IQ7vIlDF/ANjPB3J8TfymCMxWYbhgRkIpP1Mu4+bor3oPMi4izXVs0a7E+y/E+Jl9W/K jfN9Nxw9MazXsZ2LTArlARZY8s4npiePF71xQufGSmXGZ83bTtkF1H8WyNxuEJt1ok/F wg0O7rfq/E58POlMuO52V5Ut+EE1wIgyPOnS+hCVsFHE1SGaZZvABN5oX5SXXFerf1bp m1Ow== MIME-Version: 1.0 X-Received: by 10.224.5.1 with SMTP id 1mr22983733qat.30.1407379386588; Wed, 06 Aug 2014 19:43:06 -0700 (PDT) Received: by 10.140.29.55 with HTTP; Wed, 6 Aug 2014 19:43:06 -0700 (PDT) In-Reply-To: <000c01cfb165$adc18d70$0944a850$@innowireless.co.kr> References: <000001cfb09d$df123460$9d369d20$@innowireless.co.kr> <000801cfb09f$47a09a70$d6e1cf50$@innowireless.co.kr> <000301cfb0a1$7a8910a0$6f9b31e0$@innowireless.co.kr> <000401cfb0a2$f8519380$e8f4ba80$@innowireless.co.kr> <001201cfb12d$de2a6a00$9a7f3e00$@innowireless.co.kr> <000301cfb131$a626d3b0$f2747b10$@innowireless.co.kr> <000c01cfb165$adc18d70$0944a850$@innowireless.co.kr> Date: Thu, 7 Aug 2014 10:43:06 +0800 Message-ID: Subject: Re: Question on the number of column families From: Qiang Tian To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a11c2e77266e49f0500010c9c X-Virus-Checked: Checked by ClamAV on apache.org --001a11c2e77266e49f0500010c9c Content-Type: text/plain; charset=UTF-8 Hi, the description of hbase-5416 stated why it was introduced, if you only have 1 CF, dummy CF does not help. it is helpful for multi-CF case, e.g. "putting them in one column family. And "Non frequently" ones in another. " bq. "Field name will be included in rowkey." Please read the chapter 9 "Advanced usage" in book "HBase Definitive Guide" about how hbase store data on disk and how to design rowkey based on specific scenario.(rowkey is the only index you can use, so take care) bq. "The table is read-only. It is bulk-loaded once. When a new data is ready, A new table is created and the old table is deleted." the scenario is quite different. as hbase is designed for random read/write. the limitation described at http://hbase.apache.org/book/number.of.cfs.html is to consider the write case(flush&compaction), perhaps you could try 140 CFs, as long as you can presplit your regions well? after that, since no write, there will be no flush/compaction...anyway, any idea better be tested with your real data. On Wed, Aug 6, 2014 at 7:00 PM, innowireless TaeYun Kim < taeyun.kim@innowireless.co.kr> wrote: > Hi Ted, > > Now I finished reading the filtering section and the source code of > TestJoinedScanners(0.94). > > Facts learned: > > - While scanning, an entire row will be read even for a rowkey filtering. > (Since a rowkey is not a physically separate entity and stored in KeyValue > object, it's natural. Am I right?) > - The key API for the essential column family support is > setLoadColumnFamiliesOnDemand(). > > So, now I have questions: > > On rowkey filtering, which column family's KeyValue object is read? > If HBase just reads a KeyValue from a randomly selected (or just the > first) column family, how is setLoadColumnFamiliesOnDemand() affected? Can > HBase select a smaller column family intelligently? > > If setLoadColumnFamiliesOnDemand() can be applied to a rowkey filtering, a > 'dummy' column family can be used to minimize the scan cost. > > Thank you. > > > -----Original Message----- > From: innowireless TaeYun Kim [mailto:taeyun.kim@innowireless.co.kr] > Sent: Wednesday, August 06, 2014 1:48 PM > To: user@hbase.apache.org > Subject: RE: Question on the number of column families > > Thank you. > > The 'dummy' column will always hold the value '1' (or even an empty > string), that only signifies that this row exists. (And the real value is > in the other 'big' column family) The value is irrelevant since with > current schema the filtering will be done by rowkey components alone. No > column value is needed. (I will begin reading the filtering section shortly > - it is only 6 pages ahead. So sorry for my premature thoughts) > > > -----Original Message----- > From: Ted Yu [mailto:yuzhihong@gmail.com] > Sent: Wednesday, August 06, 2014 1:38 PM > To: user@hbase.apache.org > Subject: Re: Question on the number of column families > > bq. add a 'dummy' column family and apply HBASE-5416 technique > > Adding dummy column family is not the way to utilize essential column > family support - what would this dummy column family hold ? > > bq. since I have not read the filtering section of the book I'm reading yet > > Once you finish reading, you can look at the unit test > (TestJoinedScanners) from HBASE-5416. You would understand this feature > better. > > Cheers > > > On Tue, Aug 5, 2014 at 9:21 PM, innowireless TaeYun Kim < > taeyun.kim@innowireless.co.kr> wrote: > > > Thank you all. > > > > Facts learned: > > > > - Having 130 column families is too much. Don't do that. > > - While scanning, an entire row will be read for filtering, unless > > HBASE-5416 technique is applied which makes only relevant column > > family is loaded. (But it seems that still one can't load just a > > column needed while > > scanning) > > - Big row size is maybe not good. > > > > Currently it seems appropriate to follow the one-column solution that > > Alok Singh suggested, in part since currently there is no reasonable > > grouping of the fields. > > > > Here is my current thinking: > > > > - One column family, one column. Field name will be included in rowkey. > > - Eliminate filtering altogether (in most case) by properly ordering > > rowkey components. > > - If a filtering is absolutely needed, add a 'dummy' column family and > > apply HBASE-5416 technique to minimize disk read, since the field > > value can be large(~5MB). (This dummy column thing may not be right, > > I'm not sure, since I have not read the filtering section of the book > > I'm reading yet) > > > > Hope that I am not missing or misunderstanding something... > > (I'm a total newbie. I've started to read a HBase book since last > > week...) > > > > > > > > > > > > > > --001a11c2e77266e49f0500010c9c--