hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Rawson" <ryano...@gmail.com>
Subject Re: Review Request: Eliminate duplicates, stale versions. Have determined behaviour and storefile ordering
Date Wed, 08 Sep 2010 00:22:33 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/780/#review1121
-----------------------------------------------------------

Ship it!


- Ryan


On 2010-09-07 16:16:36, Pranav Khaitan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/780/
> -----------------------------------------------------------
> 
> (Updated 2010-09-07 16:16:36)
> 
> 
> Review request for hbase, stack, Jonathan Gray, Ryan Rawson, Karthik Ranganathan, and
Kannan Muthukkaruppan.
> 
> 
> Summary
> -------
> 
> Goodbye duplicates, hello consistent ordering.
> 
> A way to order multiple cells having same keys. Every StoreFile will have a timestamp
associated with it and when any two cells have same keys, they will be sorted based on timestamps
of the StoreFiles they come from. Therefore, for duplicate versions of a cell, the version
which came in latest will be returned. We are ensuring that only one version will be returned.
> 
> There are two components in this patch. The first one is associated with KeyValueHeap
and it ensures that duplicate cells are ordered correctly. The second one is in ColumnTracker
and it ensures that only one version is allowed to be returned from StoreScanner.
> 
> For all existing files and/or any files which do not have the 'TIMESTAMP' field in meta,
their timestamp will be set to zero which means they will be assumed to be the 'oldest' file.
> 
> Also changed the return codes in ColumnTracker to be consistent with definition of ScanQueryMatcher.ReturnCode.
> 
> Design discussed with jgray.
> 
> Further suggestions/questions are welcome!
> 
> 
> This addresses bugs HBASE-1485, HBASE-2406, HBASE-2649, and HBASE-997.
>     http://issues.apache.org/jira/browse/HBASE-1485
>     http://issues.apache.org/jira/browse/HBASE-2406
>     http://issues.apache.org/jira/browse/HBASE-2649
>     http://issues.apache.org/jira/browse/HBASE-997
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java 993156

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 993156

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MinorCompactingStoreScanner.java
993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 993156

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 993156 
>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 993156

>   trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/KeyValueScanFixture.java 993156

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestColumnSeeking.java PRE-CREATION

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
993156 
>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 993156

>   trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
993156 
> 
> Diff: http://review.cloudera.org/r/780/diff
> 
> 
> Testing
> -------
> 
> All existing tests are passing and running through this code path. 
> 
> Added two tests which use several combination of columns, rows, values and ensure that
only the most recent one is returned. This test will be useful for testing all kinds of column
reseeking, row reseeking, etc.
> 
> 
> Thanks,
> 
> Pranav
> 
>


Mime
View raw message