hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: A general question on maxVersion handling when we have Secondary index tables
Date Wed, 29 Aug 2012 18:18:11 GMT
We should have a hbave dev meeting and bring this as one of the topics of
discussion to bring up.  I'll start another thread on that.

On Wed, Aug 29, 2012 at 10:03 AM, Jesse Yates <jesse.k.yates@gmail.com>wrote:

> Client library style stuff is _nice_ but one of the things everyone asks of
> database is that we provide an index (cassandra has it, riak has it, mysql
> has it...hbase doesn't? Yes, different systems,etc.,etc., but the point is
> we could do it). Further, if we build it as a part of hbase, we can make it
> faster... though don't ask me the _how_ on that yet ;)
>
> My main concern is that there are many possible ways to have
an implementation that is good for one usecase / workload but will
absolutely terrible for others.


> Talking with Lars, we could provide a lot of the indexing infrastructure,
> but leave the actual indexing (convert row|cf|cq|ts|value to an index value
> and vice-versa) to a client library gives us a lot of the flexibility that
> people would need. And I take it that most people already have some form of
> indexing already (be it consistent or not), so we can do it 'the right way'
> in terms of queries, etc. and provide pluggable infrastructure (with a
> decent default) so people can roll in their own implementations.
>
> That said, I think we can do secondary indexing without too many changes to
> HBase (region co-location/pinning that Ted suggests would just be sweet
> overall)arguing for a client library. However, if we decide this is one of
> the things we want to support going forward as a project, then it makes
> more sense to do it as part of HBase, rather than pointing people to some
> guy/gal's website with the information (which may or may not be up to date)
> for how munge indexing in. Instead, it would be so much nicer to just flip
> a couple switches, maybe plug in a couple of classes and have indexing
> _just work_.
>
> Isn't that the rationale for coprocessors?  (just add something to a
config, start hbase?)

Also, with secondary indices, we'll potentially be adding new user exposed
apis.  I think this should be defineable in a way that can work accross
many algorithms.  We should figure out what they are so when there are
different implementations users can pick and choose between the
implementations that are good for them.


> Just my $0.02
>
> -Jesse
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
>
>
> On Wed, Aug 29, 2012 at 9:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > For the secondary index based on state portion of address example, I
> wonder
> > if we can achieve comparable performance using scan with proper filter.
> >
> > Cheers
> >
> > On Wed, Aug 29, 2012 at 9:11 AM, Jonathan Hsieh <jon@cloudera.com>
> wrote:
> >
> > > Ted,
> > >
> > > Ram's summarizes the concern succinctly -- to answer the specific
> > question
> > > it isn't for versions -- it is for the case where a secondary index can
> > > point to many many primary rows.  (let's say we have a rowkey userid
> and
> > we
> > > want to have a 2ndary index based on the state portion of there address
> > >  --- we'll end up pointing to many many primary rows).
> > >
> > > Jon.
> > >
> > >
> > >
> > > On Wed, Aug 29, 2012 at 7:15 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > Thanks for the detailed response, Jon.
> > > >
> > > > bq. it would mean that a query based on secondary index would
> > > > potentially have to hit every region server that has a region in the
> > > > primary table.
> > > >
> > > > Can you elaborate on the above a little bit ?
> > > > Is this because secondary index would point us to more than one
> region
> > in
> > > > the data table because several versions are saved for the same row ?
> > > >
> > > > My thinking was to ease management of simultaneous (data and index)
> > > region
> > > > split through region colocation.
> > > >
> > > > Cheers
> > > >
> > > > On Wed, Aug 29, 2012 at 6:47 AM, Jonathan Hsieh <jon@cloudera.com>
> > > wrote:
> > > >
> > > > > I'm more of a fan of having secondary indexes added as an external
> > > > feature
> > > > > (coproc or new client library on top of our current client library)
> > and
> > > > > focusing on only adding apis necessary to make 2ndary indexes
> > possible
> > > > and
> > > > > correct on/in HBase.  There are many different use patterns and
> > > > > requirements and one style of secondary index will not be good for
> > > > > everything.  Do we only care about this working well for highly
> > > > selectivity
> > > > > keys?  What are possible indexes (col name, value, value prefix,
> > > > everything
> > > > > our filters support?)  Do we care more about writes or reads, ACID
> > > > > correctness or speed, etc?  Also, there are several questions about
> > how
> > > > we
> > > > > handle other features in conjunction with 2ndary indexes:
> > replication,
> > > > bulk
> > > > > load, snapshots, to name a few.
> > > > >
> > > > > Maybe it makes sense to spend some time defining what we want to
> > index
> > > > > secondarily and what a user api to this external api would be.
>  Then
> > we
> > > > > could have the different implementations under-the-covers, and
> allow
> > > for
> > > > > users to swap implementations for the tradeoffs that fit their use
> > > cases.
> > > > >  It wouldn't be free to change but hopefully "easy" from a user
> point
> > > of
> > > > > view.
> > > > >
> > > > > Personally, I've tend to favor more of a percolator-style
> > > implementation
> > > > --
> > > > > it is a client library and built on top of hbase. This approach
> seems
> > > to
> > > > be
> > > > > more "HBase-style" with it's emphasis consistency and atomicity,
> and
> > > > seems
> > > > > to require only a few mondifications to HBase core. Sure it likely
> > > slower
> > > > > than my read of Jesse's proposal, but it seems always always
> > consistent
> > > > and
> > > > > thus predictable in cases where there are failures on deletes and
> > > > updates.
> > > > > We'd need  HBase API primitives like checkAndMutate call (check
> with
> > > > > multiple delete/put on the same row), and possibly an atomic
> > multitable
> > > > > bulkload.  I'm not sure that it is replication compatible, and
> there
> > > are
> > > > > probably questions we'll need to answer once snapshots solidifies.
> > > > >
> > > > > Ted's idea of colocating regions (like the index table's
> > > > > regions) definitely feels like a primitive (pluggable,
> > likely-per-table
> > > > > region assignment plans) that we could add to HBase core. This
> > > > requirement
> > > > > though for 2ndary indexes seems to imply an approach similar to
> > > > cassandra's
> > > > > approach -- having a local index of each region on region server
> and
> > > > > colocating them.  Is this right?  If so, this is essentially a
> > > filtering
> > > > > optimization --  it would mean that a query based on secondary
> index
> > > > would
> > > > > potentially have to hit every region server that has a region in
> the
> > > > > primary table.  This is great approach if the index lookup has high
> > > > > cardinality but if the secondary index is highly selective, you'd
> > have
> > > to
> > > > > march through a bunch or RS's before getting an answer.
> > > > >
> > > > > Jon.
> > > > >
> > > > > On Tue, Aug 28, 2012 at 9:18 PM, Ramkrishna.S.Vasudevan <
> > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > >
> > > > > > Hi
> > > > > >
> > > > > > Yes I was talking about the dead entry in the index table rather
> > than
> > > > the
> > > > > > actual data table.
> > > > > >
> > > > > > Regards
> > > > > > Ram
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Wei Tan [mailto:wtan@us.ibm.com]
> > > > > > > Sent: Tuesday, August 28, 2012 9:22 PM
> > > > > > > To: dev@hbase.apache.org
> > > > > > > Cc: Sandeep Tata
> > > > > > > Subject: Re: A general question on maxVersion handling
when we
> > have
> > > > > > > Secondary index tables
> > > > > > >
> > > > > > > Thanks for sharing a pointer to your implementation.
> > > > > > > My two cents:
> > > > > > > timestamp is a way to do MVCC and setting every KV with
the
> same
> > TS
> > > > > > > will
> > > > > > > get concurrency control very tricky and error prone, if
not
> > > > impossible
> > > > > > > I think Ram is talking about the dead entry in the index
table
> > > rather
> > > > > > > than
> > > > > > > data table. Deleting old index entries upfront when there
is a
> > new
> > > > put
> > > > > > > might be a choice.
> > > > > > >
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Wei
> > > > > > >
> > > > > > > Wei Tan
> > > > > > > Research Staff Member
> > > > > > > IBM T. J. Watson Research Center
> > > > > > > 19 Skyline Dr, Hawthorne, NY  10532
> > > > > > > wtan@us.ibm.com; 914-784-6752
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > From:   Jesse Yates <jesse.k.yates@gmail.com>
> > > > > > > To:     dev@hbase.apache.org,
> > > > > > > Date:   08/28/2012 04:00 AM
> > > > > > > Subject:        Re: A general question on maxVersion handling
> > when
> > > we
> > > > > > > have
> > > > > > > Secondary index tables
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Ram,
> > > > > > >
> > > > > > > If I understand correctly, I think you can design your
index
> such
> > > > that
> > > > > > > you
> > > > > > > don't actually use the timestamp (e.g. everything gets
put
> with a
> > > TS
> > > > =
> > > > > > > 10
> > > > > > > -
> > > > > > > or some other non-special, relatively small number that's
not 0
> > as
> > > > I'd
> > > > > > > worry about that in HBase ;) Then when you set maxVersions
to
> 1,
> > > > > > > everything
> > > > > > > should be good.
> > > > > > >
> > > > > > > You get a couple of wasted bytes from the TS, but with
the
> > > prefixTrie
> > > > > > > stuff
> > > > > > > that should be pretty minimal overhead. If you do need
to keep
> > > track
> > > > of
> > > > > > > the
> > > > > > > timestamp you should be able to munge that back up into
the
> > column
> > > > > > > qualifier (and just know that that last 64 bits is the
> > timestamp).
> > > > > > > Again a
> > > > > > > little more CPU cost, but its really not that big of an
> overhead.
> > > It
> > > > > > > seems
> > > > > > > like you don't really care about the TS though, in which
case
> > this
> > > > > > > should
> > > > > > > be pretty simple.
> > > > > > >
> > > > > > > Out of curiosity, what are people using for their secondary
> > > indexing
> > > > > > > solutions? I know there are a bunch out there, but don't
know
> > what
> > > > > > > people
> > > > > > > have adopted, what they like/dislike, design tradeoffs
made and
> > > why.
> > > > > > >
> > > > > > > Disclaimer: I recently proposed a secondary indexing solution
> > > myself
> > > > > > > (shameless self-plug:
> > > > > > >
> http://jyates.github.com/2012/07/09/consistent-enough-secondary-
> > > > > > > indexes.html
> > > > > > > )
> > > > > > > and its something I'm working on for Salesforce - open
sourced
> at
> > > > some
> > > > > > > point, promise!
> > > > > > >
> > > > > > > -Jesse
> > > > > > > -------------------
> > > > > > > Jesse Yates
> > > > > > > @jesse_yates
> > > > > > > jyates.github.com
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan
<
> > > > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > > > >
> > > > > > > > Hi All
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > When we try to build any type of secondary indices
for a
> given
> > > > table
> > > > > > > how
> > > > > > > > can
> > > > > > > > one handle maxVersions in the secondary index tables.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > For eg,
> > > > > > > >
> > > > > > > > I have inserted
> > > > > > > >
> > > > > > > >  Row1  -  Val1  => t
> > > > > > > >
> > > > > > > > Row1 - Val2 => t+1
> > > > > > > >
> > > > > > > > Row1 - Val3. => t+2
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Ideally if my max versions is only one then Val3 should
be my
> > > > result
> > > > > > > If
> > > > > > > I
> > > > > > > > query on main table for row1.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Now in my index I will be having all the above 3 entries.
>  Now
> > > how
> > > > > > > can
> > > > > > > we
> > > > > > > > remove the older entries from the index table that
does not
> fit
> > > > into
> > > > > > > > maxVersions.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Currently while scanning and the code that avoids
the max
> > > Versions
> > > > > > > does
> > > > > > > not
> > > > > > > > give any hooks to know the entries skipped thro versions.
> > > > > > > >
> > > > > > > > So any suggestions on this, I am still seeing the
code for
> any
> > > > other
> > > > > > > > options
> > > > > > > > but suggestions welcome.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards
> > > > > > > >
> > > > > > > > Ram
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > // Jonathan Hsieh (shay)
> > > > > // Software Engineer, Cloudera
> > > > > // jon@cloudera.com
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > // Jonathan Hsieh (shay)
> > > // Software Engineer, Cloudera
> > > // jon@cloudera.com
> > >
> >
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message