hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: Clarification regarding HBase reads
Date Mon, 21 Feb 2011 06:37:24 GMT
We do our indexes by using an index family in the same table we are indexing and make sure
that no index rowkey could possibly be a valid data rowkey. This does not guarantee they will
be in the same transaction but it does allow you to batch your puts for both data and index
together. 

-chris
  
On Feb 20, 2011, at 9:49 PM, Hari Sreekumar wrote:

> All right, I understand the integrity cost is there because we don't get
> transactions in HBase over multiple tables. Thanks a lot for your time and
> help :)
> 
> Hari
> 
> On Mon, Feb 21, 2011 at 11:09 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
>> One particular region is for only one table. So the answer to your first
>> question is no.
>> 
>> For your use case, you need to consider the cost of maintaining 4 index
>> tables (in terms of data integrity).
>> You should try to minimize the number of index tables.
>> 
>> On Sun, Feb 20, 2011 at 7:50 PM, Hari Sreekumar <hsreekumar@clickable.com
>>> wrote:
>> 
>>> So if I have 10 tables each with 2 families, I'd open up 20 stores
>> whenever
>>> I open a region for reading? Is it a problem to have too many tables.
>> e.g,
>>> if I have 1 big table and 4 indexing tables for the big table? Are there
>>> any
>>> potential issues with this?
>>> 
>>> Thanks,
>>> Hari
>>> 
>>> On Sun, Feb 20, 2011 at 8:47 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>> 
>>>>>> Does this mean that a store instance is opened for all tables
>> present
>>> in
>>>>>> HBase irrespective of which table we are querying and for all
>>>>>> columnfamilies?
>>>> No. The blog says Store instance is for each family.
>>>> 
>>>> You should generally avoid multiple column families. But we can help
>> you
>>>> analyze your use case.
>>>> If you read through https://issues.apache.org/jira/browse/HBASE-3149,
>>> you
>>>> would better understand current implementation.
>>>> 
>>>> On Sun, Feb 20, 2011 at 6:38 AM, Hari Sreekumar <
>>> hsreekumar@clickable.com
>>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I was going through the HBase architecture blog by Lars George (
>>>>> 
>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html)
>>>> and
>>>>> I just wanted a clarification regarding how HBase reads data. The
>> blog
>>>>> mentions that :
>>>>> 
>>>>> Next the HRegionServer opens the region it creates a corresponding
>>>>> HRegion object.
>>>>> When the HRegion is "opened" it sets up a Store instance for each
>>>>> HColumnFamily for every table as defined by the user beforehand. Each
>>> of
>>>>> the Store instances can in turn have one or more StoreFile instances,
>>>> which
>>>>> are lightweight wrappers around the actual storage file called HFile.
>> A
>>>>> HRegion also has a MemStore and a HLog instance. We will now have a
>>> look
>>>> at
>>>>> how they work together but also where there are exceptions to the
>> rule.
>>>>> 
>>>>> Does this mean that a store instance is opened for all tables present
>>> in
>>>>> HBase irrespective of which table we are querying and for all
>>>>> columnfamilies? Is this why I generally see people avoiding large
>>> number
>>>> of
>>>>> tables/large number of column families. If not, what is the reason
>> for
>>>>> that?
>>>>> Is it true at all that we should avoid too many tables/CFs ?
>>>>> 
>>>>> Thanks,
>>>>> Hari
>>>>> 
>>>> 
>>> 
>> 


Mime
View raw message