hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Rawson" <ryano...@gmail.com>
Subject Re: Review Request: delete followed by a put with the same timestamp
Date Fri, 26 Nov 2010 23:33:53 GMT


> On 2010-11-26 14:54:45, Ryan Rawson wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java, line 1373
> > <http://review.cloudera.org/r/1252/diff/1/?file=17712#file17712line1373>
> >
> >     what are all the consequences for not sorting by type when using KVComparator?
 Does this mean we might create HFiles that not sorted properly, because the HFile comparator
uses the KeyComparator directly with ignoreType = false. 
> >     
> >     While in memstore we can rely on memstoreTS to roughly order by insertion time,
and the Put/Delete should probably work in that situation, you are talking about modifiying
a pretty core and important concept in how we sort things.
> >     
> >     There are other ways to reconcile bugs like this, one of them is to extend the
memstoreTS concept into the HFile and use that to reconcile during reads.  There is another
JIRA where I proposed this.  
> >     
> >     If we are talking about 0.92 and beyond I'd prefer building a solid base rather
than dangerous hacks like this.  Our unit tests are not extremely extensive, so while they
might pass, that doesnt guarantee lack of bad behaviour later on.
> >
> 
> Pranav Khaitan wrote:
>     Agree. As I mentioned, this is a major change and more thought needs to be given
to it.
>     
>     However, to resolve issues like HBASE-3276, we need either such a change or extend
the memstoreTS concept to HFile as you mentioned.
>     
>     About consequences, I don't see anything negative here. This change only affects
the sorting of keys having same row, col, timestamp. After this change, all keys with the
same row, col, ts will be sorted purely based on the order in which they were inserted. When
a memstore is flushed to HFile, the memstoreTS takes care of ordering. During compactions,
the KeyValueHeap breaks ties by using the sequence ids of storefiles.

the problem is you are now changing how things are ordered sometimes but not all the time.
 HFile directly uses the rawcomparator, instantiating it directly rather than getting it via
the code path you changed.  So now you create a memstore in this order:

row,col,100,Put  (memstoreTS=1)
row,col,100,Delete (memstoreTS=2)
row,col,100,Put (memstoreTS=3)

But the HFile comparator will consider this out of order since it doesnt know about memstoreTS
and it still expects things to be in a certain order.

I'm a little wary of having implicit ordering in the HFiles... in your new scheme, Put,Delete,Put
are in that order 'just because they are', and the comparator cannot put them back in order,
and must rely on scanner order.  During compactions we would place keys in order based on
which files they came from, but they wouldn't themselves have an order.  Basically we should
get rid of 'type sorting' and use memstoreTS sorting in memory and implicit sorting in the
HFiles.  


- Ryan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1252/#review1993
-----------------------------------------------------------


On 2010-11-26 07:47:02, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1252/
> -----------------------------------------------------------
> 
> (Updated 2010-11-26 07:47:02)
> 
> 
> Review request for hbase, Jonathan Gray and Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> This is a design change suggested in HBASE-3276 so adequate thought should be given before
proceeding. 
> 
> The main code change is just one line which is to ignore key type while doing KV comparisons.
When the key type is ignored, then all the keys for the same timestamp are sorted according
the order in which they were interested. It is still ensured that the delete family and delete
column will be at the top because they have the default column name and default timestamp.
> 
> 
> This addresses bug HBASE-3276.
>     http://issues.apache.org/jira/browse/HBASE-3276
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java 1039233 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 1039233

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreScanner.java 1039233

> 
> Diff: http://review.cloudera.org/r/1252/diff
> 
> 
> Testing
> -------
> 
> Test cases added. Since there is a change in semantics, some previous tests were failing
because of this change. Those tests have been modified to test the newer behavior.
> 
> 
> Thanks,
> 
> Pranav
> 
>


Mime
View raw message