Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com
 designates 209.85.160.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <DC5EBE7F3610EB4CA5C7E92D76873E1518629B58CD@exchange2007.carrieriq.com>
References: 
 <OF29595329.9553A1C2-ON85257C6F.0070EFCA-85257C6F.007120FA@us.ibm.com>
	<1391028493.66723.YahooMailNeo@web140606.mail.bf1.yahoo.com>
	<OF106B9F34.D1BDBCCF-ON85257C6F.00727BDE-85257C6F.0072B084@us.ibm.com>
	<DC5EBE7F3610EB4CA5C7E92D76873E1518629B58CD@exchange2007.carrieriq.com>
Date: Wed, 29 Jan 2014 13:36:27 -0800
Message-ID: 
 <CALte62ytR8tz4K3-UAz-gxMkr6iLOReBzkDVRR-5xi_VF7_QSw@mail.gmail.com>
Subject: Re: larger HFile block size for very wide row?
From: Ted Yu <yuzhihong@gmail.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=20cf300faba9b72e6904f122bb66

--20cf300faba9b72e6904f122bb66
Content-Type: text/plain; charset=ISO-8859-1

bq. table:family2 holds only row keys (no data) from  table:family1.

Wei:
You can designate family2 as essential column family so that family1 is
brought into heap when needed.


On Wed, Jan 29, 2014 at 1:33 PM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> Yes, your row will be split by KV boundaries - no need to increase default
> block size, except, probably, performance.
> You will need to try different sizes to find optimal performance in your
> use case.
> I would not use combination of scan & get on the same table:family with
> very large rows.
> Either some kind of secondary indexing is needed or do scan on different
> family (which has the same row keys)
>
> table:family1 holds original data
> table:family2 holds only row keys (no data) from  table:family1.
> Your scan will be MUCH faster in this case.
>
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Wei Tan [wtan@us.ibm.com]
> Sent: Wednesday, January 29, 2014 12:52 PM
> To: user@hbase.apache.org
> Subject: Re: larger HFile block size for very wide row?
>
> Sorry, 1000 columns, each 2K, so each row is 2M. I guess HBase will keep a
> single KV (i.e., a column rather than a row) in a block, so a row will
> span multiple blocks?
>
> My scan pattern is: I will do range scan, find the matching row keys, and
> fetch the whole row for each row that matches my criteria.
>
> Best regards,
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan
>
>
>
> From:   lars hofhansl <larsh@apache.org>
> To:     "user@hbase.apache.org" <user@hbase.apache.org>,
> Date:   01/29/2014 03:49 PM
> Subject:        Re: larger HFile block size for very wide row?
>
>
>
> You 1000 columns? Not 1000k = 1m column, I assume.
> So you'll have 2MB KVs. That's a bit on the large side.
>
> HBase will "grow" the block to fit the KV into it. It means you have
> basically one block per KV.
> I guess you address these rows via point gets (GET), and do not typically
> scan through them, right?
>
> Do you see any performance issues?
>
> -- Lars
>
>
>
> ________________________________
>  From: Wei Tan <wtan@us.ibm.com>
> To: user@hbase.apache.org
> Sent: Wednesday, January 29, 2014 12:35 PM
> Subject: larger HFile block size for very wide row?
>
>
> Hi, I have a HBase table where each row has ~1000k columns, ~2K each. My
> table scan pattern is to use a row key filter but I need to fetch the
> whole row (~1000 k) columns back.
>
> Shall I set HFile block size to be larger than the default 64K?
> Thanks,
> Wei
>
> ---------------------------------
> Wei Tan, PhD
> Research Staff Member
> IBM T. J. Watson Research Center
> http://researcher.ibm.com/person/us-wtan
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

--20cf300faba9b72e6904f122bb66--