hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Angus He <angu...@gmail.com>
Subject Re: Column-oriented data modal
Date Fri, 31 Jul 2009 08:05:26 GMT
OK,OK,OK.

If data is stored row-by-row in hbase, how could you explain the text
under section "Physical Storage View" in
http://wiki.apache.org/hadoop/Hbase/HbaseArchitecture.
Is the page stale or something else wrong?

On Fri, Jul 31, 2009 at 3:50 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
> Data is stored row-by-row in the hbase store files (aka hfiles).
> HBase is not a column-oriented-store as described in the wikipedia
> article: http://en.wikipedia.org/wiki/Column-oriented_DBMS
>
> Have a look at the bigtable paper, do some searches, lots of material
> out there describing the benefits of a flexible store like
> bigtable/hbase.
>
> -ryan
>
>
>
> On Fri, Jul 31, 2009 at 12:42 AM, Angus He<angushe@gmail.com> wrote:
>> Hi Ryan,
>>
>> You cannot equate the "column" in that article of wikipedia to the
>> "column" in HBase.
>>
>> We should assume that the word "column" in "column-oriented" is
>> predefined, otherwise, it is meaningless.
>>
>> So we should consider the "column" in wikipedia as "column-family" in
>> HBase.  In this way, the article can answer 宏明's question.
>>
>>
>> On Fri, Jul 31, 2009 at 3:18 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
>>> Hey,
>>>
>>> The bigtable paper talks more about column families, but in HBase each
>>> column family is stored in it's own file.  That means there is disk
>>> locality for different column families.  The canonical use is to put
>>> web crawl data in one family, and meta data (like derived meta data)
>>> in another.  That way scanning just the meta data is not as expensive
>>> as scanning the web page crawl dump.
>>>
>>> Column families are pre-defined - the "schema" for what it's worth -
>>> but the 'qualifier' within a family is dynamically determined by the
>>> client.
>>>
>>> In the terminology of the article, hbase would be more 'row oriented',
>>> but with the column family snag, it isnt that simple.  Since rows from
>>> different families are stored in different files, reading efficiency
>>> is related to which column families you are reading in a query.
>>>
>>> -ryan
>>>
>>> On Fri, Jul 31, 2009 at 12:02 AM, Angus He<angushe@gmail.com> wrote:
>>>> Hi Ryan,
>>>>
>>>> 1. If it is not the case , what is the purpose of introduction of
>>>> "column family"?
>>>> Does the contents from different column family stored in different
>>>> files in HBase?
>>>>
>>>> BTW, in the bigtable paper, we can find the following text:
>>>> "Access control and both disk and memory accounting are performed at
>>>> the column-family level."
>>>>
>>>> 2. I was wondering if HBase shares the benefits described in the
>>>> "Benefits" sections of wikipedia article. If not, what is the meaning
>>>> of  "column-stores" in HBase?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Jul 31, 2009 at 2:30 PM, Ryan Rawson<ryanobjc@gmail.com> wrote:
>>>>> HBase and bigtable are referred to column-stores, but we arent a
>>>>> 'column oriented dbms' as described in the wikipedia.
>>>>>
>>>>> At the storage level, hbase stores key-values, where the key is a
>>>>> triple of row / column / timestamp.  Files are ordered lists of these
>>>>> key/values, and they are sorted in that order, hence rows are stored
>>>>> together, then sorted by column then reverse by timestamp (newest on
>>>>> top).
>>>>>
>>>>> Thus hbase is not a 'column store' in the sense listed in the wikipedia
entry.
>>>>>
>>>>> On Thu, Jul 30, 2009 at 11:23 PM, Angus He<angushe@gmail.com> wrote:
>>>>>> Why don't you try to google it first?
>>>>>> After googling with the keyword "Column-oriented", the first result
is
>>>>>> exactly what you want.
>>>>>> http://en.wikipedia.org/wiki/Column-oriented_DBMS
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2009/7/31  <y_823910@tsmc.com>:
>>>>>>> Hi,
>>>>>>> Does anyone can tell me the benefit of Column-oriented data modal?
>>>>>>> Thank you
>>>>>>>
>>>>>>> Fleming
>>>>>>> 宏明
>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>                                          
              TSMC PROPERTY
>>>>>>>  This email communication (and any attachments) is proprietary
information
>>>>>>>  for the sole use of its
>>>>>>>  intended recipient. Any unauthorized review, use or distribution
by anyone
>>>>>>>  other than the intended
>>>>>>>  recipient is strictly prohibited.  If you are not the intended
recipient,
>>>>>>>  please notify the sender by
>>>>>>>  replying to this email, and then delete this email and any
copies of it
>>>>>>>  immediately. Thank you.
>>>>>>>  ---------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards
>>>>>> Angus
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards
>>>> Angus
>>>>
>>>
>>
>>
>>
>> --
>> Regards
>> Angus
>>
>



-- 
Regards
Angus

Mime
View raw message