hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques <whs...@gmail.com>
Subject Re: Rows vs. Columns
Date Tue, 20 Mar 2012 06:08:41 GMT
As the advice says...  Millions of colums are not a good idea.   If your
user information will be sparse eg only a few hundred users will associate
with a particular row you'll be fine.  However if your matrix is complete
you probably need to store as rows.  Also you should check out advice (a
jira bug covers this) about frequent flushes using column families of
substantially different sizes if the blob is large and the info is small.
On Mar 19, 2012 1:07 PM, "Konrad Tendera" <konrad@tendera.eu> wrote:

> Hello,
> I'm designing some schema for my use case and I'm considering what will be
> better: rows or columns. Here's what I need - my schema actually looks like
> this (it will be used for keeping not large pdf files or single pages of
> larger document)
> table files:
>    family "info":
>        "info:pg" - keeps page number
>        "info:id" - sender ID
>        "info:nm" - pdf name
>        ***
>    family "data":
>        "data:blob" - blob of pdf file
> Now let's get back to ***: each user can add multiple of additional
> properties ("name" - "value"), but let's assume that every user will be so
> creative that there won't be two same names. I don't know how solve this
> problem: each "name" will be new column ("info:name") or I should try to do
> this like it is said here: http://hbase.apache.org/book.**
> html#schema.smackdown.rowscols<http://hbase.apache.org/book.html#schema.smackdown.rowscols>and
make new row for earch property?
> K.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message