hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkrishna.S.Vasudevan" <ramkrishna.vasude...@huawei.com>
Subject RE: A general question on maxVersion handling when we have Secondary index tables
Date Wed, 29 Aug 2012 04:18:40 GMT
Hi 

Yes I was talking about the dead entry in the index table rather than the
actual data table.

Regards
Ram

> -----Original Message-----
> From: Wei Tan [mailto:wtan@us.ibm.com]
> Sent: Tuesday, August 28, 2012 9:22 PM
> To: dev@hbase.apache.org
> Cc: Sandeep Tata
> Subject: Re: A general question on maxVersion handling when we have
> Secondary index tables
> 
> Thanks for sharing a pointer to your implementation.
> My two cents:
> timestamp is a way to do MVCC and setting every KV with the same TS
> will
> get concurrency control very tricky and error prone, if not impossible
> I think Ram is talking about the dead entry in the index table rather
> than
> data table. Deleting old index entries upfront when there is a new put
> might be a choice.
> 
> 
> Best Regards,
> Wei
> 
> Wei Tan
> Research Staff Member
> IBM T. J. Watson Research Center
> 19 Skyline Dr, Hawthorne, NY  10532
> wtan@us.ibm.com; 914-784-6752
> 
> 
> 
> From:   Jesse Yates <jesse.k.yates@gmail.com>
> To:     dev@hbase.apache.org,
> Date:   08/28/2012 04:00 AM
> Subject:        Re: A general question on maxVersion handling when we
> have
> Secondary index tables
> 
> 
> 
> Ram,
> 
> If I understand correctly, I think you can design your index such that
> you
> don't actually use the timestamp (e.g. everything gets put with a TS =
> 10
> -
> or some other non-special, relatively small number that's not 0 as I'd
> worry about that in HBase ;) Then when you set maxVersions to 1,
> everything
> should be good.
> 
> You get a couple of wasted bytes from the TS, but with the prefixTrie
> stuff
> that should be pretty minimal overhead. If you do need to keep track of
> the
> timestamp you should be able to munge that back up into the column
> qualifier (and just know that that last 64 bits is the timestamp).
> Again a
> little more CPU cost, but its really not that big of an overhead. It
> seems
> like you don't really care about the TS though, in which case this
> should
> be pretty simple.
> 
> Out of curiosity, what are people using for their secondary indexing
> solutions? I know there are a bunch out there, but don't know what
> people
> have adopted, what they like/dislike, design tradeoffs made and why.
> 
> Disclaimer: I recently proposed a secondary indexing solution myself
> (shameless self-plug:
> http://jyates.github.com/2012/07/09/consistent-enough-secondary-
> indexes.html
> )
> and its something I'm working on for Salesforce - open sourced at some
> point, promise!
> 
> -Jesse
> -------------------
> Jesse Yates
> @jesse_yates
> jyates.github.com
> 
> 
> On Tue, Aug 28, 2012 at 12:24 AM, Ramkrishna.S.Vasudevan <
> ramkrishna.vasudevan@huawei.com> wrote:
> 
> > Hi All
> >
> >
> >
> > When we try to build any type of secondary indices for a given table
> how
> > can
> > one handle maxVersions in the secondary index tables.
> >
> >
> >
> > For eg,
> >
> > I have inserted
> >
> >  Row1  -  Val1  => t
> >
> > Row1 - Val2 => t+1
> >
> > Row1 - Val3. => t+2
> >
> >
> >
> > Ideally if my max versions is only one then Val3 should be my result
> If
> I
> > query on main table for row1.
> >
> >
> >
> > Now in my index I will be having all the above 3 entries.  Now how
> can
> we
> > remove the older entries from the index table that does not fit into
> > maxVersions.
> >
> >
> >
> > Currently while scanning and the code that avoids the max Versions
> does
> not
> > give any hooks to know the entries skipped thro versions.
> >
> > So any suggestions on this, I am still seeing the code for any other
> > options
> > but suggestions welcome.
> >
> >
> >
> > Regards
> >
> > Ram
> >
> >



Mime
View raw message