hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Konrad Tendera <kon...@tendera.eu>
Subject Re: Rows vs. Columns
Date Tue, 20 Mar 2012 09:40:32 GMT
I think that two separate tables can work, because users usually fetch file info and the blob
of specific file is fetched rarely.

On Tue, 20 Mar 2012 04:32:48 -0500
Michael Segel <michael_segel@hotmail.com> wrote:

> Yes, 
> 
> Currently if one of the column family causes a split, then all of the column families
get split. So if you are dealing with a large blob, you're going to shoot yourself in the
foot. 
> 
> Are you filtering on any of the values in the 'info' family? 
> If not, you could try creating a serialized record. (AVRO is an example) for the info
data, 
> and then store the data in a single column family where one column contains the info
rec and the other column contains the blob. 
> 
> Or you could use two tables with the same row key. But that would mean two get()s...
having said that if you were doing a table scan, you'd want to scan the info column and based
on the results, you would fetch back the blob.
> 
> HTH
> 
> -Mike
> 
> On Mar 20, 2012, at 3:56 AM, Laxman wrote:
> 
> > Do we see any problem with the below schema?
> > 
> >      family "info":
> >          "info:pg" - keeps page number
> >          "info:id" - sender ID
> >          "info:nm" - pdf name
> >          "info:prop_name" - column to hold property name
> >          "info:prop_value" - column to hold property value
> >      family "data":
> >          "data:blob" - blob of pdf file
> > 
> > --
> > Regards,
> > Laxman
> >> -----Original Message-----
> >> From: Konrad Tendera [mailto:konrad@tendera.eu]
> >> Sent: Monday, March 19, 2012 8:22 PM
> >> To: user@hbase.apache.org
> >> Subject: Rows vs. Columns
> >> 
> >> Hello,
> >> 
> >> I'm designing some schema for my use case and I'm considering what will
> >> be better: rows or columns. Here's what I need - my schema actually
> >> looks like this (it will be used for keeping not large pdf files or
> >> single pages of larger document)
> >> table files:
> >>     family "info":
> >>         "info:pg" - keeps page number
> >>         "info:id" - sender ID
> >>         "info:nm" - pdf name
> >>         ***
> >>     family "data":
> >>         "data:blob" - blob of pdf file
> >> 
> >> Now let's get back to ***: each user can add multiple of additional
> >> properties ("name" - "value"), but let's assume that every user will be
> >> so creative that there won't be two same names. I don't know how solve
> >> this problem: each "name" will be new column ("info:name") or I should
> >> try to do this like it is said here:
> >> http://hbase.apache.org/book.html#schema.smackdown.rowscols and make
> >> new
> >> row for earch property?
> >> 
> >> K.
> > 
> > 
> 


-- 
Konrad Tendera

Mime
View raw message