hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: A general question on maxVersion handling when we have Secondary index tables
Date Thu, 30 Aug 2012 04:18:34 GMT
Yes Jon.  You got it right.  This is the problem.  But all the
implementation we need to have some type of mechanism where we go thro all
the rows in the sec index table.
Suppose in the below example if I say in my main table maxVersions is 5. So
I will scan the top 5 values from the sec index table and once I get the 6th
value I need to delete the first one from the sec index table.  This
involves some type of cache or map where I can keep incrementing the count
for every row that we get. And whenever I see I have value which is more
than maxVersions delete the oldest one.

We also thought of another option though it is slower
->Scan one row in Sec table.  
->Extract the actual row key of the main table and scan the main table using
that.  Here I will be getting only the required version entries.  
-> Now based on these entries delete the expired entries from the sec index
table.  
Thought of doing this in Compaction time.(Major).
But doing this has one problem like when ever we do compaction we deal with
direct store level scanners.  Even if we try to use the new hooks added by
Lars H preCompactScannerOpen(),
This scanner always expects the kvs to be ordered.  But we may not be able
to get them in order if we try the way mentioned here. 

We also felt that if we have a hook while filtering out the expired KVs may
be we can try using this? But need to check how much it is efficient.

So the suggestion given by Jon is one of the option but it involves more
caching and we may need to go for a persistant caching also if the size goes
increasing.

Thanks to all for providing your suggestions.  

Regards
Ram


> -----Original Message-----
> From: Jonathan Hsieh [mailto:jon@cloudera.com]
> Sent: Wednesday, August 29, 2012 11:16 PM
> To: dev@hbase.apache.org
> Subject: Re: A general question on maxVersion handling when we have
> Secondary index tables
> 
> Let me rephrase to make sure I'm on the same page for the ram's
> question:
> 
> We do three inserts on row 1 at different times to the same column
> (which
> is being indexed in a secondary table)  (Are we assuming only a 1-to-1
> secondary->primary mapping?)
> 
> t1< t2 <t3
> put ("row1", "cf:c", "val1", t1)
> put ("row1", "cf:c", "val2", t2)
> put ("row1", "cf:c", "val3", t3)
> 
> What happens is in the primary table we have:
> 
> row1 / cf:c = val1 @ t1
> row1 / cf:c = val3 @ t2
> row1 / cf:c = val3 @ t3
> 
> I'm assuming that these writes happen to a secondary table like this:
> put ("val1", "r", "row1", t1)
> put ("val2", "r", "row1", t2)
> put ("val3", "r", "row1", t3)
> 
> an in the secondary table we have:
> 
> val1 / r = row1 @ t1
> val2 / r = row1 @ t2
> val3 / r = row1 @ t3
> 
> The core question is how and when can we efficiently and correctly get
> rid
> of the now invalid val1, val2 rows in the index table.
> 
> Let's look at some of the strawmen:
> 1) periodic scan of secondary table that add delete markers for invalid
> entries (removed on major compact)
> 2) lazily delete marker on reads that are invalid (we are @t4, attempt
> to
> read via "val2" in 2ndary index, see primary value is invalid so do a
> checkAndDelete val2 from 2ndary).  would get removed on major compact.
> 3) delete on update.  This means we need to know if we are modifying a
> value and thus incurs a at least an extra read per write.
> 
> Ram, does this seem like the right question and potential options to
> consider?
> 
> Jon.
> 
> On Wed, Aug 29, 2012 at 8:12 AM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasudevan@huawei.com> wrote:
> 
> > When we have many to one mapping between main and secondary index
> table may
> > be we will end up in hitting many RS. If there is one to one mapping
> may be
> > that is not a problem.
> >
> > Basically my intention of this discussion was mainly to discuss on
> the
> > version maintenance on any type of secondary index particularly to
> remove
> > the stale data in the index table that would have expired.
> >
> > Regards
> > Ram
> >
> >
> > > -----Original Message-----
> > > From: Ted Yu [mailto:yuzhihong@gmail.com]
> > > Sent: Wednesday, August 29, 2012 7:45 PM
> > > To: dev@hbase.apache.org
> > > Subject: Re: A general question on maxVersion handling when we have
> > > Secondary index tables
> > >
> > > Thanks for the detailed response, Jon.
> > >
> > > bq. it would mean that a query based on secondary index would
> > > potentially have to hit every region server that has a region in
> the
> > > primary table.
> > >
> > > Can you elaborate on the above a little bit ?
> > > Is this because secondary index would point us to more than one
> region
> > > in
> > > the data table because several versions are saved for the same row
> ?
> > >
> > > My thinking was to ease management of simultaneous (data and index)
> > > region
> > > split through region colocation.
> > >
> > > Cheers
> > >
> > > On Wed, Aug 29, 2012 at 6:47 AM, Jonathan Hsieh <jon@cloudera.com>
> > > wrote:
> > >
> > > > I'm more of a fan of having secondary indexes added as an
> external
> > > feature
> > > > (coproc or new client library on top of our current client
> library)
> > > and
> > > > focusing on only adding apis necessary to make 2ndary indexes
> > > possible and
> > > > correct on/in HBase.  There are many different use patterns and
> > > > requirements and one style of secondary index will not be good
> for
> > > > everything.  Do we only care about this working well for highly
> > > selectivity
> > > > keys?  What are possible indexes (col name, value, value prefix,
> > > everything
> > > > our filters support?)  Do we care more about writes or reads,
> ACID
> > > > correctness or speed, etc?  Also, there are several questions
> about
> > > how we
> > > > handle other features in conjunction with 2ndary indexes:
> > > replication, bulk
> > > > load, snapshots, to name a few.
> > > >
> > > > Maybe it makes sense to spend some time defining what we want to
> > > index
> > > > secondarily and what a user api to this external api would be.
> Then
> > > we
> > > > could have the different implementations under-the-covers, and
> allow
> > > for
> > > > users to swap implementations for the tradeoffs that fit their
> use
> > > cases.
> > > >  It wouldn't be free to change but hopefully "easy" from a user
> point
> > > of
> > > > view.
> > > >
> > > > Personally, I've tend to favor more of a percolator-style
> > > implementation --
> > > > it is a client library and built on top of hbase. This approach
> seems
> > > to be
> > > > more "HBase-style" with it's emphasis consistency and atomicity,
> and
> > > seems
> > > > to require only a few mondifications to HBase core. Sure it
> likely
> > > slower
> > > > than my read of Jesse's proposal, but it seems always always
> > > consistent and
> > > > thus predictable in cases where there are failures on deletes and
> > > updates.
> > > > We'd need  HBase API primitives like checkAndMutate call (check
> with
> > > > multiple delete/put on the same row), and possibly an atomic
> > > multitable
> > > > bulkload.  I'm not sure that it is replication compatible, and
> there
> > > are
> > > > probably questions we'll need to answer once snapshots
> solidifies.
> > > >
> > > > Ted's idea of colocating regions (like the index table's
> > > > regions) definitely feels like a primitive (pluggable, likely-
> per-
> > > table
> > > > region assignment plans) that we could add to HBase core. This
> > > requirement
> > > > though for 2ndary indexes seems to imply an approach similar to
> > > cassandra's
> > > > approach -- having a local index of each region on region server
> and
> > > > colocating them.  Is this right?  If so, this is essentially a
> > > filtering
> > > > optimization --  it would mean that a query based on secondary
> index
> > > would
> > > > potentially have to hit every region server that has a region in
> the
> > > > primary table.  This is great approach if the index lookup has
> high
> > > > cardinality but if the secondary index is highly selective, you'd
> > > have to
> > > > march through a bunch or RS's before getting an answer.
> > > >
> > > > Jon.
> > > >
> > > > On Tue, Aug 28, 2012 at 9:18 PM, Ramkrishna.S.Vasudevan <
> > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > Yes I was talking about the dead entry in the index table
> rather
> > > than the
> > > > > actual data table.
> > > > >
> > > > > Regards
> > > > > Ram
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Wei Tan [mailto:wtan@us.ibm.com]
> > > > > > Sent: Tuesday, August 28, 2012 9:22 PM
> > > > > > To: dev@hbase.apache.org
> > > > > > Cc: Sandeep Tata
> > > > > > Subject: Re: A general question on maxVersion handling when
> we
> > > have
> > > > > > Secondary index tables
> > > > > >
> > > > > > Thanks for sharing a pointer to your implementation.
> > > > > > My two cents:
> > > > > > timestamp is a way to do MVCC and setting every KV with the
> same
> > > TS
> > > > > > will
> > > > > > get concurrency control very tricky and error prone, if not
> > > impossible
> > > > > > I think Ram is talking about the dead entry in the index
> table
> > > rather
> > > > > > than
> > > > > > data table. Deleting old index entries upfront when there is
> a
> > > new put
> > > > > > might be a choice.
> > > > > >
> > > > > >
> > > > > > Best Regards,
> > > > > > Wei
> > > > > >
> > > > > > Wei Tan
> > > > > > Research Staff Member
> > > > > > IBM T. J. Watson Research Center
> > > > > > 19 Skyline Dr, Hawthorne, NY  10532
> > > > > > wtan@us.ibm.com; 914-784-6752
> > > > > >
> > > > > >
> > > > > >
> > > > > > From:   Jesse Yates <jesse.k.yates@gmail.com>
> > > > > > To:     dev@hbase.apache.org,
> > > > > > Date:   08/28/2012 04:00 AM
> > > > > > Subject:        Re: A general question on maxVersion handling
> > > when we
> > > > > > have
> > > > > > Secondary index tables
> > > > > >
> > > > > >
> > > > > >
> > > > > > Ram,
> > > > > >
> > > > > > If I understand correctly, I think you can design your index
> such
> > > that
> > > > > > you
> > > > > > don't actually use the timestamp (e.g. everything gets put
> with a
> > > TS =
> > > > > > 10
> > > > > > -
> > > > > > or some other non-special, relatively small number that's not
> 0
> > > as I'd
> > > > > > worry about that in HBase ;) Then when you set maxVersions to
> 1,
> > > > > > everything
> > > > > > should be good.
> > > > > >
> > > > > > You get a couple of wasted bytes from the TS, but with the
> > > prefixTrie
> > > > > > stuff
> > > > > > that should be pretty minimal overhead. If you do need to
> keep
> > > track of
> > > > > > the
> > > > > > timestamp you should be able to munge that back up into the
> > > column
> > > > > > qualifier (and just know that that last 64 bits is the
> > > timestamp).
> > > > > > Again a
> > > > > > little more CPU cost, but its really not that big of an
> overhead.
> > > It
> > > > > > seems
> > > > > > like you don't really care about the TS though, in which case
> > > this
> > > > > > should
> > > > > > be pretty simple.
> > > > > >
> > > > > > Out of curiosity, what are people using for their secondary
> > > indexing
> > > > > > solutions? I know there are a bunch out there, but don't know
> > > what
> > > > > > people
> > > > > > have adopted, what they like/dislike, design tradeoffs made
> and
> > > why.
> > > > > >
> > > > > > Disclaimer: I recently proposed a secondary indexing solution
> > > myself
> > > > > > (shameless self-plug:
> > > > > > http://jyates.github.com/2012/07/09/consistent-enough-
> secondary-
> > > > > > indexes.html
> > > > > > )
> > > > > > and its something I'm working on for Salesforce - open
> sourced at
> > > some
> > > > > > point, promise!
> > > > > >
> > > > > > -Jesse
> > > > > > -------------------
> > > > > > Jesse Yates
> > > > > > @jesse_yates
> > > > > > jyates.github.com
> > > > > >
> > > > > >
> > > > > > On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan <
> > > > > > ramkrishna.vasudevan@huawei.com> wrote:
> > > > > >
> > > > > > > Hi All
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > When we try to build any type of secondary indices for
a
> given
> > > table
> > > > > > how
> > > > > > > can
> > > > > > > one handle maxVersions in the secondary index tables.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > For eg,
> > > > > > >
> > > > > > > I have inserted
> > > > > > >
> > > > > > >  Row1  -  Val1  => t
> > > > > > >
> > > > > > > Row1 - Val2 => t+1
> > > > > > >
> > > > > > > Row1 - Val3. => t+2
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Ideally if my max versions is only one then Val3 should
be
> my
> > > result
> > > > > > If
> > > > > > I
> > > > > > > query on main table for row1.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Now in my index I will be having all the above 3 entries.
> Now
> > > how
> > > > > > can
> > > > > > we
> > > > > > > remove the older entries from the index table that does
not
> fit
> > > into
> > > > > > > maxVersions.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Currently while scanning and the code that avoids the max
> > > Versions
> > > > > > does
> > > > > > not
> > > > > > > give any hooks to know the entries skipped thro versions.
> > > > > > >
> > > > > > > So any suggestions on this, I am still seeing the code
for
> any
> > > other
> > > > > > > options
> > > > > > > but suggestions welcome.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Ram
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > // Jonathan Hsieh (shay)
> > > > // Software Engineer, Cloudera
> > > > // jon@cloudera.com
> > > >
> >
> >
> 
> 
> --
> // Jonathan Hsieh (shay)
> // Software Engineer, Cloudera
> // jon@cloudera.com


Mime
View raw message