hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bin YANG" <yangbinism...@gmail.com>
Subject question on HBase storage
Date Wed, 31 Oct 2007 08:03:07 GMT

I have several questions on the physical storage of the HBase:

1. Does HBase store each table in A format:

"com.cnn.www", t6, "<html>...",
"com.cnn.www", t5, "<html>...",
"com.cnn.www", t3, "<html>...",
"com.sohu.www", t8, "<html>..."
"com.sohu.www", t7, "<html>..."

or B fomat:

"com.cnn.www", t6, "<html>...",
                 t5, "<html>...",
                 t3, "<html>...",
"com.sohu.www", t8, "<html>...",
                  t7, "<html>..."

A format treat RowKey and TimeStamp as key, and wastes space of the
RowKey "com.cnn.www" or "com.sohu.www"several times.

While B format treat RowKey as key, and TimeStamp and Column as
attributes. And each row doesn't maintain the same format.

2. Another question, maybe we will get several labels in the same
family at the same time. For example, we will crawl a web page at time
t1, and the page contains 2 anchors, one is a.com, the other is b.com.
How to store it in hbase?

"com.cnn.www", t1, "anchor:a.com", "aaa",
"com.cnn.www", t1, "anchor:b.com", "bbb",
"com.cnn.www", t2, "anchor:c.com", "ccc"


"com.cnn.www", t1, "anchor:a.com", "aaa",
                     "anchor:b.com", "bbb",
"com.cnn.www", t2, "anchor:c.com", "ccc"



Department of Computer Science and Engineering
Fudan University
Shanghai, P. R. China
EMail: yangbinisme82@gmail.com

View raw message