Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8110A109CC for ; Wed, 29 Jan 2014 21:36:57 +0000 (UTC) Received: (qmail 50906 invoked by uid 500); 29 Jan 2014 21:36:54 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 50781 invoked by uid 500); 29 Jan 2014 21:36:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 50773 invoked by uid 99); 29 Jan 2014 21:36:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 21:36:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.160.176 as permitted sender) Received: from [209.85.160.176] (HELO mail-yk0-f176.google.com) (209.85.160.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jan 2014 21:36:49 +0000 Received: by mail-yk0-f176.google.com with SMTP id 131so11759772ykp.7 for ; Wed, 29 Jan 2014 13:36:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7cIlVIbQc07vu2JV8kn0GVEWI33DlVmj0XSEFnwABps=; b=VjgZCxpA/10kmTx+xc0V6EENNrSJVGAeRUuMM3TSN1oOUc5QyNF2JLfQvCjrCS5fBc fsKapJJF9hfKEcenvQy8Sui8f/NTnNTFQofTME/TKDrnONnK7ULdtwbYqpj8LnG94B0O aBABRbsQ3qu2ntkEc3rqBFLt0qH2jP8J9e8e53W1uwt1Y3A++gKZsBTVX+O+jmFK3KIl eGnr5ZsfNaYYcj4CNv74jbBLCe71elAY9NaARNGQ1ggz9G0I54lUFeBy5fRd195c5YLa yNnuZ+Hbqg4RsG25FO3xQ33YkIHg0LE6hf9/L2hxuURcEDHRdEuwkrToVwa7f+Py2kps dSJw== MIME-Version: 1.0 X-Received: by 10.236.113.115 with SMTP id z79mr9868269yhg.8.1391031387364; Wed, 29 Jan 2014 13:36:27 -0800 (PST) Received: by 10.170.122.18 with HTTP; Wed, 29 Jan 2014 13:36:27 -0800 (PST) In-Reply-To: References: <1391028493.66723.YahooMailNeo@web140606.mail.bf1.yahoo.com> Date: Wed, 29 Jan 2014 13:36:27 -0800 Message-ID: Subject: Re: larger HFile block size for very wide row? From: Ted Yu To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=20cf300faba9b72e6904f122bb66 X-Virus-Checked: Checked by ClamAV on apache.org --20cf300faba9b72e6904f122bb66 Content-Type: text/plain; charset=ISO-8859-1 bq. table:family2 holds only row keys (no data) from table:family1. Wei: You can designate family2 as essential column family so that family1 is brought into heap when needed. On Wed, Jan 29, 2014 at 1:33 PM, Vladimir Rodionov wrote: > Yes, your row will be split by KV boundaries - no need to increase default > block size, except, probably, performance. > You will need to try different sizes to find optimal performance in your > use case. > I would not use combination of scan & get on the same table:family with > very large rows. > Either some kind of secondary indexing is needed or do scan on different > family (which has the same row keys) > > table:family1 holds original data > table:family2 holds only row keys (no data) from table:family1. > Your scan will be MUCH faster in this case. > > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodionov@carrieriq.com > > ________________________________________ > From: Wei Tan [wtan@us.ibm.com] > Sent: Wednesday, January 29, 2014 12:52 PM > To: user@hbase.apache.org > Subject: Re: larger HFile block size for very wide row? > > Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep a > single KV (i.e., a column rather than a row) in a block, so a row will > span multiple blocks? > > My scan pattern is: I will do range scan, find the matching row keys, and > fetch the whole row for each row that matches my criteria. > > Best regards, > Wei > > --------------------------------- > Wei Tan, PhD > Research Staff Member > IBM T. J. Watson Research Center > http://researcher.ibm.com/person/us-wtan > > > > From: lars hofhansl > To: "user@hbase.apache.org" , > Date: 01/29/2014 03:49 PM > Subject: Re: larger HFile block size for very wide row? > > > > You 1000 columns? Not 1000k = 1m column, I assume. > So you'll have 2MB KVs. That's a bit on the large side. > > HBase will "grow" the block to fit the KV into it. It means you have > basically one block per KV. > I guess you address these rows via point gets (GET), and do not typically > scan through them, right? > > Do you see any performance issues? > > -- Lars > > > > ________________________________ > From: Wei Tan > To: user@hbase.apache.org > Sent: Wednesday, January 29, 2014 12:35 PM > Subject: larger HFile block size for very wide row? > > > Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My > table scan pattern is to use a row key filter but I need to fetch the > whole row (~1000 k) columns back. > > Shall I set HFile block size to be larger than the default 64K? > Thanks, > Wei > > --------------------------------- > Wei Tan, PhD > Research Staff Member > IBM T. J. Watson Research Center > http://researcher.ibm.com/person/us-wtan > > Confidentiality Notice: The information contained in this message, > including any attachments hereto, may be confidential and is intended to be > read only by the individual or entity to whom this message is addressed. If > the reader of this message is not the intended recipient or an agent or > designee of the intended recipient, please note that any review, use, > disclosure or distribution of this message or its attachments, in any form, > is strictly prohibited. If you have received this message in error, please > immediately notify the sender and/or Notifications@carrieriq.com and > delete or destroy any copy of this message and its attachments. > --20cf300faba9b72e6904f122bb66--