hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: A general question on maxVersion handling when we have Secondary index tables
Date Wed, 29 Aug 2012 17:07:39 GMT
I agree with Jesse.
For the initial implementation, we can pick a common use case. When users
present more use cases, we add more support in HBase core.

On Wed, Aug 29, 2012 at 10:03 AM, Jesse Yates <jesse.k.yates@gmail.com>wrote:

> Client library style stuff is _nice_ but one of the things everyone asks of
> database is that we provide an index (cassandra has it, riak has it, mysql
> has it...hbase doesn't? Yes, different systems,etc.,etc., but the point is
> we could do it). Further, if we build it as a part of hbase, we can make it
> faster... though don't ask me the _how_ on that yet ;)
>
> Talking with Lars, we could provide a lot of the indexing infrastructure,
> but leave the actual indexing (convert row|cf|cq|ts|value to an index value
> and vice-versa) to a client library gives us a lot of the flexibility that
> people would need. And I take it that most people already have some form of
> indexing already (be it consistent or not), so we can do it 'the right way'
> in terms of queries, etc. and provide pluggable infrastructure (with a
> decent default) so people can roll in their own implementations.
>
> That said, I think we can do secondary indexing without too many changes to
> HBase (region co-location/pinning that Ted suggests would just be sweet
> overall)arguing for a client library. However, if we decide this is one of
> the things we want to support going forward as a project, then it makes
> more sense to do it as part of HBase, rather than pointing people to some
> guy/gal's website with the information (which may or may not be up to date)
> for how munge indexing in. Instead, it would be so much nicer to just flip
> a couple switches, maybe plug in a couple of classes and have indexing
> _just work_.
>
> Just my $0.02
>
> -Jesse
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
>
> On Wed, Aug 29, 2012 at 9:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > For the secondary index based on state portion of address example, I
> wonder
> > if we can achieve comparable performance using scan with proper filter.
> >
> > Cheers
> >
> > On Wed, Aug 29, 2012 at 9:11 AM, Jonathan Hsieh <jon@cloudera.com>
> wrote:
> >
> > > Ted,
> > >
> > > Ram's summarizes the concern succinctly -- to answer the specific
> > question
> > > it isn't for versions -- it is for the case where a secondary index can
> > > point to many many primary rows.  (let's say we have a rowkey userid
> and
> > we
> > > want to have a 2ndary index based on the state portion of there address
> > >  --- we'll end up pointing to many many primary rows).
> > >
> > > Jon.
> > >
> > >
> > >
> > > On Wed, Aug 29, 2012 at 7:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Thanks for the detailed response, Jon.
> > > >
> > > > bq. it would mean that a query based on secondary index would
> > > > potentially have to hit every region server that has a region in the
> > > > primary table.
> > > >
> > > > Can you elaborate on the above a little bit ?
> > > > Is this because secondary index would point us to more than one
> region
> > in
> > > > the data table because several versions are saved for the same row ?
> > > >
> > > > My thinking was to ease management of simultaneous (data and index)
> > > region
> > > > split through region colocation.
> > > >
> > > > Cheers
> > > >
> > > > On Wed, Aug 29, 2012 at 6:47 AM, Jonathan Hsieh <jon@cloudera.com>
> > > wrote:
> > > >
> > > > > I'm more of a fan of having secondary indexes added as an external
> > > > feature
> > > > > (coproc or new client library on top of our current client library)
> > and
> > > > > focusing on only adding apis necessary to make 2ndary indexes
> > possible
> > > > and
> > > > > correct on/in HBase.  There are many different use patterns and
> > > > > requirements and one style of secondary index will not be good for
> > > > > everything.  Do we only care about this working well for highly
> > > > selectivity
> > > > > keys?  What are possible indexes (col name, value, value prefix,
> > > > everything
> > > > > our filters support?)  Do we care more about writes or reads, ACID
> > > > > correctness or speed, etc?  Also, there are several questions about
> > how
> > > > we
> > > > > handle other features in conjunction with 2ndary indexes:
> > replication,
> > > > bulk
> > > > > load, snapshots, to name a few.
> > > > >
> > > > > Maybe it makes sense to spend some time defining what we want to
> > index
> > > > > secondarily and what a user api to this external api would be.
>  Then
> > we
> > > > > could have the different implementations under-the-covers, and
> allow
> > > for
> > > > > users to swap implementations for the tradeoffs that fit their use
> > > cases.
> > > > >  It wouldn't be free to change but hopefully "easy" from a user
> point
> > > of
> > > > > view.
> > > > >
> > > > > Personally, I've tend to favor more of a percolator-style
> > > implementation
> > > > --
> > > > > it is a client library and built on top of hbase. This approach
> seems
> > > to
> > > > be
> > > > > more "HBase-style" with it's emphasis consistency and atomicity,
> and
> > > > seems
> > > > > to require only a few mondifications to HBase core. Sure it likely
> > > slower
> > > > > than my read of Jesse's proposal, but it seems always always
> > consistent
> > > > and
> > > > > thus predictable in cases where there are failures on deletes and
> > > > updates.
> > > > > We'd need  HBase API primitives like checkAndMutate call (check
> with
> > > > > multiple delete/put on the same row), and possibly an atomic
> > multitable
> > > > > bulkload.  I'm not sure that it is replication compatible, and
> there
> > > are
> > > > > probably questions we'll need to answer once snapshots solidifies.
> > > > >
> > > > > Ted's idea of colocating regions (like the index table's
> > > > > regions) definitely feels like a primitive (pluggable,
> > likely-per-table
> > > > > region assignment plans) that we could add to HBase core. This
> > > > requirement
> > > > > though for 2ndary indexes seems to imply an approach similar to
> > > > cassandra's
> > > > > approach -- having a local index of each region on region server
> and
> > > > > colocating them.  Is this right?  If so, this is essentially a
> > > filtering
> > > > > optimization --  it would mean that a query based on secondary
> index
> > > > would
> > > > > potentially have to hit every region server that has a region in
> the
> > > > > primary table.  This is great approach if the index lookup has high
> > > > > cardinality but if the secondary index is highly selective, you'd
> > have
> > > to
> > > > > march through a bunch or RS's before getting an answer.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Tue, Aug 28, 2012 at 9:18 PM, Ramkrishna.S.Vasudevan <
> > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > Yes I was talking about the dead entry in the index table rather
> > than
> > > > the
> > > > > > actual data table.
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Wei Tan [mailto:wtan@us.ibm.com]
> > > > > > > Sent: Tuesday, August 28, 2012 9:22 PM
> > > > > > > To: dev@hbase.apache.org
> > > > > > > Cc: Sandeep Tata
> > > > > > > Subject: Re: A general question on maxVersion handling
when we
> > have
> > > > > > > Secondary index tables
> > > > > > >
> > > > > > > Thanks for sharing a pointer to your implementation.
> > > > > > > My two cents:
> > > > > > > timestamp is a way to do MVCC and setting every KV with
the
> same
> > TS
> > > > > > > will
> > > > > > > get concurrency control very tricky and error prone, if
not
> > > > impossible
> > > > > > > I think Ram is talking about the dead entry in the index
table
> > > rather
> > > > > > > than
> > > > > > > data table. Deleting old index entries upfront when there
is a
> > new
> > > > put
> > > > > > > might be a choice.
> > > > > > >
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Wei
> > > > > > >
> > > > > > > Wei Tan
> > > > > > > Research Staff Member
> > > > > > > IBM T. J. Watson Research Center
> > > > > > > 19 Skyline Dr, Hawthorne, NY  10532
> > > > > > > wtan@us.ibm.com; 914-784-6752
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > From:   Jesse Yates <jesse.k.yates@gmail.com>
> > > > > > > To:     dev@hbase.apache.org,
> > > > > > > Date:   08/28/2012 04:00 AM
> > > > > > > Subject:        Re: A general question on maxVersion handling
> > when
> > > we
> > > > > > > have
> > > > > > > Secondary index tables
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Ram,
> > > > > > >
> > > > > > > If I understand correctly, I think you can design your
index
> such
> > > > that
> > > > > > > you
> > > > > > > don't actually use the timestamp (e.g. everything gets
put
> with a
> > > TS
> > > > =
> > > > > > > 10
> > > > > > > -
> > > > > > > or some other non-special, relatively small number that's
not 0
> > as
> > > > I'd
> > > > > > > worry about that in HBase ;) Then when you set maxVersions
to
> 1,
> > > > > > > everything
> > > > > > > should be good.
> > > > > > >
> > > > > > > You get a couple of wasted bytes from the TS, but with
the
> > > prefixTrie
> > > > > > > stuff
> > > > > > > that should be pretty minimal overhead. If you do need
to keep
> > > track
> > > > of
> > > > > > > the
> > > > > > > timestamp you should be able to munge that back up into
the
> > column
> > > > > > > qualifier (and just know that that last 64 bits is the
> > timestamp).
> > > > > > > Again a
> > > > > > > little more CPU cost, but its really not that big of an
> overhead.
> > > It
> > > > > > > seems
> > > > > > > like you don't really care about the TS though, in which
case
> > this
> > > > > > > should
> > > > > > > be pretty simple.
> > > > > > >
> > > > > > > Out of curiosity, what are people using for their secondary
> > > indexing
> > > > > > > solutions? I know there are a bunch out there, but don't
know
> > what
> > > > > > > people
> > > > > > > have adopted, what they like/dislike, design tradeoffs
made and
> > > why.
> > > > > > >
> > > > > > > Disclaimer: I recently proposed a secondary indexing solution
> > > myself
> > > > > > > (shameless self-plug:
> > > > > > >
> http://jyates.github.com/2012/07/09/consistent-enough-secondary-
> > > > > > > indexes.html
> > > > > > > )
> > > > > > > and its something I'm working on for Salesforce - open
sourced
> at
> > > > some
> > > > > > > point, promise!
> > > > > > >
> > > > > > > -Jesse
> > > > > > > -------------------
> > > > > > > Jesse Yates
> > > > > > > @jesse_yates
> > > > > > > jyates.github.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan
<
> > > > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > > > >
> > > > > > > > Hi All
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > When we try to build any type of secondary indices
for a
> given
> > > > table
> > > > > > > how
> > > > > > > > can
> > > > > > > > one handle maxVersions in the secondary index tables.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > For eg,
> > > > > > > >
> > > > > > > > I have inserted
> > > > > > > >
> > > > > > > >  Row1  -  Val1  => t
> > > > > > > >
> > > > > > > > Row1 - Val2 => t+1
> > > > > > > >
> > > > > > > > Row1 - Val3. => t+2
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Ideally if my max versions is only one then Val3 should
be my
> > > > result
> > > > > > > If
> > > > > > > I
> > > > > > > > query on main table for row1.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Now in my index I will be having all the above 3 entries.
>  Now
> > > how
> > > > > > > can
> > > > > > > we
> > > > > > > > remove the older entries from the index table that
does not
> fit
> > > > into
> > > > > > > > maxVersions.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Currently while scanning and the code that avoids
the max
> > > Versions
> > > > > > > does
> > > > > > > not
> > > > > > > > give any hooks to know the entries skipped thro versions.
> > > > > > > >
> > > > > > > > So any suggestions on this, I am still seeing the
code for
> any
> > > > other
> > > > > > > > options
> > > > > > > > but suggestions welcome.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards
> > > > > > > >
> > > > > > > > Ram
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message