hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Rows vs. Columns
Date Tue, 20 Mar 2012 09:32:48 GMT

Currently if one of the column family causes a split, then all of the column families get
split. So if you are dealing with a large blob, you're going to shoot yourself in the foot.

Are you filtering on any of the values in the 'info' family? 
If not, you could try creating a serialized record. (AVRO is an example) for the info data,

and then store the data in a single column family where one column contains the info rec and
the other column contains the blob. 

Or you could use two tables with the same row key. But that would mean two get()s... having
said that if you were doing a table scan, you'd want to scan the info column and based on
the results, you would fetch back the blob.



On Mar 20, 2012, at 3:56 AM, Laxman wrote:

> Do we see any problem with the below schema?
>      family "info":
>          "info:pg" - keeps page number
>          "info:id" - sender ID
>          "info:nm" - pdf name
>          "info:prop_name" - column to hold property name
>          "info:prop_value" - column to hold property value
>      family "data":
>          "data:blob" - blob of pdf file
> --
> Regards,
> Laxman
>> -----Original Message-----
>> From: Konrad Tendera [mailto:konrad@tendera.eu]
>> Sent: Monday, March 19, 2012 8:22 PM
>> To: user@hbase.apache.org
>> Subject: Rows vs. Columns
>> Hello,
>> I'm designing some schema for my use case and I'm considering what will
>> be better: rows or columns. Here's what I need - my schema actually
>> looks like this (it will be used for keeping not large pdf files or
>> single pages of larger document)
>> table files:
>>     family "info":
>>         "info:pg" - keeps page number
>>         "info:id" - sender ID
>>         "info:nm" - pdf name
>>         ***
>>     family "data":
>>         "data:blob" - blob of pdf file
>> Now let's get back to ***: each user can add multiple of additional
>> properties ("name" - "value"), but let's assume that every user will be
>> so creative that there won't be two same names. I don't know how solve
>> this problem: each "name" will be new column ("info:name") or I should
>> try to do this like it is said here:
>> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make
>> new
>> row for earch property?
>> K.

View raw message