hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: composite value vs composite qualifier
Date Mon, 21 Jun 2010 22:31:11 GMT
Not sure there is a right/wrong way.  You should probably just do what you're most comfortable
with / what makes the most sense to you.

> -----Original Message-----
> From: N Kapshoo [mailto:nkapshoo@gmail.com]
> Sent: Monday, June 21, 2010 3:23 PM
> To: user@hbase.apache.org
> Subject: Re: composite value vs composite qualifier
> 
> Does it still make sense to follow the previous id generation we
> talked about? (for performance reasons instead of storing an entire
> string?)
> 
> <docId><byte1> = value1
> <docId><byte2> = value2
> 
> instead of
> <docId><author> = value1
> <docId><status> = value2
> etc?
> 
> 
> On Mon, Jun 21, 2010 at 5:19 PM, N Kapshoo <nkapshoo@gmail.com> wrote:
> > Aha. That makes sense (both atomic writes and Filters).
> >
> > I am definitely only looking to filter within a given user, so looks
> > like what you describe below might work for me.
> >
> > Thanks so much for all your help, Jonathan. You have saved me (at
> > least) 2 weeks of tinkering and poking around!
> >
> > On Mon, Jun 21, 2010 at 5:10 PM, Jonathan Gray <jgray@facebook.com>
> wrote:
> >> It would be inefficient to run that query against this schema, if
> you're talking about finding all documents with a given author across
> all users.  In that case you'd want to use an additional table that had
> row keys as authors.
> >>
> >> If you want to search for documents with a specific author within a
> given users documents (single row) then you could use filters, and as
> Andrey said, it would be simpler if it was broken up into individual
> qualifiers but could also be done with a custom filter to read the
> serialized value.
> >>
> >> To answer your question, you'd want a QualifierFilter that matched
> against qualifiers of the form <anylong><author> and then a ValueFilter
> which matched the value against the specific author you're looking for.
> >>
> >> JG
> >>
> >>> -----Original Message-----
> >>> From: N Kapshoo [mailto:nkapshoo@gmail.com]
> >>> Sent: Monday, June 21, 2010 2:59 PM
> >>> To: user@hbase.apache.org
> >>> Subject: Re: composite value vs composite qualifier
> >>>
> >>> I am not sure how to use filters in my case since I do not know the
> >>> column name.
> >>> Eg:
> >>> DocInfo: 123213+author = "abc"
> >>>
> >>> 123213 is the docId. If I want to look for authors named 'abc' in
> all
> >>> docs, how would I go about specifying a filter?
> >>>
> >>> Thanks.
> >>>
> >>> On Mon, Jun 21, 2010 at 4:20 PM, Andrey Stepachev
> <octo47@gmail.com>
> >>> wrote:
> >>> > 2010/6/22 N Kapshoo <nkapshoo@gmail.com>
> >>> >
> >>> >> Is there any querying value in separating out values tied to
> each
> >>> >> other vs. keeping them in a serialized object? I am guessing the
> >>> >> second option would be much faster considering it is one
> composite
> >>> >> value on the disk, but I would like to know if there are any
> >>> specific
> >>> >> advantages to doing things the other way. Thanks.
> >>> >> The values themselves are very small, basic information in
> String.
> >>> >>
> >>> >> Eg:
> >>> >>
> >>> >> DocInfo: <docId><type> = value1
> >>> >> DocInfo: <docId><priority> = value2
> >>> >> DocInfo: <docId><etcetc> = value3
> >>> >>
> >>> >>
> >>> >> Vs
> >>> >>
> >>> >> DocInfo: docId = value (JSON(type, priority, etcetc))
> >>> >>
> >>> >> Thank you.
> >>> >>
> >>> >
> >>> > This is mostly depends on usage pattern.
> >>> >
> >>> > 1. each value in storage have full key
> >>> key/family/qualifier/timestamp, so
> >>> > keyvalue size increasing
> >>> > (but this negative effect can be negated by using compression).
> So
> >>> > serialisation form will be smaller, take less disk io, and can be
> >>> faster.
> >>> >
> >>> > 2. second option gives you atomic updates (i.e all data comes as
> one
> >>> > "piece") and with first option you
> >>> > can have concurrent updates of the fields (and of course
> individual
> >>> history,
> >>> > in opposite to serialized object, which will have history for a
> whole
> >>> > object)
> >>> >
> >>> > 3. in serialised form you cant use server side filters (out of
> the
> >>> box, you
> >>> > should patch hbase to support custom filters, which will
> deserialise
> >>> object
> >>> > or use jsonpath on it's serialised form), but with first option -
> you
> >>> can.
> >>> >
> >>
> >

Mime
View raw message