hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taeyun Kim <taeyun....@innowireless.com>
Subject RE: general question about datamodel => empty columns
Date Mon, 19 Jan 2015 00:48:39 GMT

(Warning: I'm kind of a newbie...) 
I would make the tables as follows:

<table1: Single column (named 'c') to save space by avoiding the overhead by the key and
multiple cells>
row1: type1 + foo
row2: type2 + bar
row3: type1 + baz
row4: type2 + whatever

<index1: Again single column, and the data value is duplicated from table1. With this you
can just Scan through the index1 to get the values, avoiding Gets to table1.>
type1 + row1: foo
type1 + row3: baz
type2 + row2: bar
type2 + row4: whatever

<index2> Not needed


-----Original Message-----
From: Wilm Schumacher [mailto:wilm.schumacher@gmail.com] 
Sent: Saturday, January 17, 2015 6:23 AM
To: user@hbase.apache.org
Subject: general question about datamodel => empty columns


I run into a problem , which I encounter several times by now and perhaps you can help me.

What should I include in tables where just the qualifier is needed? E.g.
in indexing you have to make the reference of the index either by columns, or by rows in the
index table. But in this way, there is no data to put into the table.

An example for clarification:
Suppose you want to make an index for another table which indexes something like "type of

row1 data:type => type1 , data:data => foo
row2 data:type => type2 , data:data => bar
row3 data:type => type1 , data:data => baz
row4 data:type => type2 , data:data => whatever

indexing 1: indexing by columns

type1 index:row1 => ??? , index:row2 => ???
type2 index:row2 => ??? , index:row4 => ???

indexing 2: indexing by rows
type1-row1 ??:?? => ??
type1-row3 ??:?? => ??
type2-row2 ??:?? => ??
type2-row3 ??:?? => ??

works if there is any column family file to scan. Thus I need data.

either way ... I actually have to put data where it is'n needed.

What should I do to insert into the columns? By now I mostly use the timestamp of creation,
which in my opinion is quite stupid, as I have the timestamp in the column right away. This
only would waste space. I could use empty strings (bytes), which will work, but somehow feels

What are you using? Is empty string/useless timestamp common practice?

Best wishes,


View raw message