hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: newbie question: what is better? one with a lot of keys OR a lot of tables with fewer keys?
Date Thu, 12 Nov 2009 09:29:19 GMT
See http://issues.apache.org/jira/browse/HBASE-1537 . Already committed on trunk. You can set
a parameter on a Scan object that will cause HBase to chunk the response. The semantics of
the scanner's next() method changes in this case. More than one call to next() may be required
to move to the next row depending on the number of values in the row result. Values returned
by each call to next() WILL NOT span rows however. 

This only works for scanners. Making this work for gets would be involved. 

    - Andy




________________________________
From: Ryan Rawson <ryanobjc@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Thu, November 12, 2009 2:33:41 PM
Subject: Re: newbie question: what is better? one with a lot of keys OR a lot  of tables with
fewer keys?

Yes you are absolutely correct.  HBase must materialize the row for
the data you retrieve. If that is one column family, or one column or
a list of columns or the entire row.  It just has to fit into memory.
It requires a API change to fix, not sure if that is making into 0.21.
But if you split up by column family as you indicated, HBase only
retrieves the data necessary.

-ryan

On Wed, Nov 11, 2009 at 10:25 PM, Greg Cottman <greg.cottman@quest.com> wrote:
> Hi Ryan,
>
> If you only query columns from one column family though, won't HBase use data locality
to fetch only enough data to populate that column family?
>
> That way you can have rows with more columns in them, and still write efficient queries
that don't fetch all the irrelevant columns in a fat row.
>
> Cheers,
> Greg.
>
> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Thursday, 12 November 2009 5:18 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: newbie question: what is better? one with a lot of keys OR a lot of tables
with fewer keys?
>
> Either is fine. When you read an entire row from hbase, it must
> materialize the entire row in ram. Thus your table width is limited if
> you wish to read the entire row at a time.
>
> On Wed, Nov 11, 2009 at 9:45 PM, Jeff Zhang <zjffdu@gmail.com> wrote:
>> Continue this question,
>>
>> which is better for hbase, more rows with fewer columns or fewer rows with
>> more columns
>>
>>
>> Jeff Zhang
>>
>>
>> On Thu, Nov 12, 2009 at 5:17 AM, TuxRacer69 <tuxracer69@gmail.com> wrote:
>>
>>> Thank you Jean-Daniel
>>>
>>>
>>> Jean-Daniel Cryans wrote:
>>>
>>>> Alex,
>>>>
>>>> In HBase it really makes more sense to put all the data you can in a
>>>> single table as it will be automatically partitioned and distributed
>>>> across the region servers (providing you have more than 256MB of
>>>> data).
>>>>
>>>> J-D
>>>>
>>>>
>>>
>>>
>>
>



      
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message