hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fondermann <bernd.fonderm...@googlemail.com>
Subject Re: HBase - Column family
Date Tue, 26 Apr 2011 10:04:14 GMT
2011/4/23 Panayotis Antonopoulos <antonopoulospan@hotmail.com>:
> I am also a beginner, so I would like to ask you something about the method you proposed.
> HBase is column-oriented. This means (as far as I know from databases) that it stores
its data column by column and not row by row.

Fortunately, this is an oversimplification. HBase has data efficiently
accessible by row. Strictly speaking, it is not even a column-oriented
database. It's a column-family-oriented database. From the docs:
"Physically they are stored on a per-column family basis."

> If we use the schema you suggested then when we want some of the documents for a single
word we will have to access many columns and I think this will cost as a lot.

No, it is very efficient, even more so if you access columns from a
single column family only.
AFAIK, there is no way to access HBase by-column only, without being
in the context of a dedicated row.

> I think that the locality of the data is lost using this schema.

No, I don't think so.

> I repeat that I am a beginner so please correct me if I am wrong.

This presentation might help:


> Regards,
> Panagiotis.
>> Date: Sat, 23 Apr 2011 11:25:47 +0200
>> Subject: Re: HBase - Column family
>> From: bernd.fondermann@googlemail.com
>> To: user@hbase.apache.org
>> That's how I would do it:
>> What's nice in HBase is that you can store all the data for one of
>> your keywords in a single row.
>> Create a column family "doc_id".
>> Now, for each word, you create one row.
>> In this row, for each matching document you create one column (that's
>> the gotcha compared to a RDB design).
>> The name of the column is the doc id. The column's cell content is the weight.
>> So, following your example you'd get:
>> row id | column-family:column....
>> HELLO |  doc_id:2 | doc_id:3 | doc_id:4
>> and column values:
>> doc_id:2 | doc_id:3 | doc_id:4
>> 12 | 45 | 36
>> HTH,
>>   Bernd
>> On Sat, Apr 23, 2011 at 09:56, JohnJohnGa <JohnJohnGa@gmail.com> wrote:
>> > Hi, I'm a beginner in HBase. I need to design my table. I want to play with
>> > following information:
>> >
>> > At the date XX-XX-XXXX, the word 'HELLO' is in document 2,3,4 and the weight
>> > each doc is 12,45,36 - My raw data: doc:D title:'i like potatoes',weight:W,date:D
>> >
>> > I created a table with, row: word, column:date, value:doc But I can't store
>> > multiple row with the same date, for the same word.
>> >
>> > Can we create multiple column families for a table? What can be the best way
>> > design the schema?
>> >
>> > Thanks a lot
>> >
>> >

View raw message