hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1784) [hbase] delete
Date Sun, 09 Sep 2007 17:03:29 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

stack updated HADOOP-1784:

    Attachment: delete3.patch

Failing TestScanner2 was because after rework, a row of all deleted values would stop the
scanner (I heart unit tests).  This patch version 3 includes fix.  Below is an updated commit
message.  Includes note of new fix and edits of previous message.

HADOOP-1784 delete
Fix scanners and gets so they work properly in presence of deletes.
Added a deleteAll to remove all cells equal to or older than passed
timestamp.  Fixed compaction so deleted cells do not make it out into
compacted output.  Ensure also that versions > column max are dropped

M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseTestCase.java
    (Loader): Renamed 'Loader' Interface as 'Incommon' -- as in the
    methods HTable and HRegion have in common -- because now does
    more than 'loading'.  Added getters, delete, deleteAll and scanners
    and amended the implementations of Incommon particular for HTable
    and HRegion.
    (createTableDescriptor): Add override so can specify column versions.
    (FlushCache): Added an interface that can be implemented by things
    that flush their cache (e.g. HRegion and HTable).
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java
    (flushcache): Added. Flushes all regionserver regions.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestTimestamp.java
    (testDelete, testTimestampScanning): Refactored so local tests that
    against an HRegion could be run -- via the Incommon interface
    -- from the client side with HTable inside testTimestamps.
    (doTestDelete, assertOnlyLatest, assertVersions, 
      doTestTimestampScanning, assertScanContentTimestamp, put,
      delete): Added.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/MultiRegionTable.java
    Renamed Loader interface as Incommon.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestCompaction.java
    Add assertions that on compaction deleted rows are dropped and that
    versions > than column maximum versions are also dropped.
    (setUp, tearDown): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
    javadoc edits
    (getFilesToCompact): Changed so list of tiles is ordered from newest to
    oldest.  Was doing oldest to newest.
    (compact): Keep running list per row of whats been deleted. Used checking
    later encountered cells.  If key matches deleted cell seen earlier, the
    later cell is not added to compacted output.
    (isDeleted): Added.  Checks running list of deletes found locally but
    also consults memcache in case it has deletes for current cell (and
    therefore we should not return this version of the cell).
    (get): Keep running list per row of whats been deleted. Used checking
    later encountered cells.
    (hasEnoughVersions, getKeys): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
    (batchUpdate): If passed timestamp if LATEST_TIMESTAMP, then all
    puts get the server's current timestamp.  Deletes get special handling.
    We fetch the 'latest' cell of same row and column and using ITS
    timestamp, we write a delete record.  Otherwise, works as previous.
    (deleteAll): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HScannerInterface.java
    javadoc.  Changed next method param from TreeMap to more generic SortedMap
    so could have HInternalScannerInterface extend this Interface.
    A minor inconvenience is that the Close in this base interface throws
    IOException whereas HInternalScannerInterface does not (had to add
    'useless' try/catch in two close locations).  Fix..
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HTable.java
    javadoc edit.
    (deleteAll): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
    Removed a few superfluous List allocations.
    (getKeys, isDeleted): Added.  isDeleted is called when looking at
    store cells to see if memcache has a delete to X them out.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionInterface.java
    javadoc edit.
    (deleteAll): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HInternalScannerInterface.java
    Made it inherit from HScannerInterface.  Remove next and close (They
    are inherited).
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HAbstractScanner.java
    (next): Changed param from TreeMap to SortedMap.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java
    (equals): Works if passed bytes too.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/BatchOperation.java
    Redid batch operations as enums. Made constructors cascade.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/BatchUpdate.java
    javadoc edit and fixed eclipse complaints about param names being same
    as data member names..
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
   (get): Fix count of versions across memcache and stores.  We were not
   doing version aggregating counts properly.  Also fix so properly ordered.
   (deleteAll, update, getKeys, deleteMultiple): Added.
   (next): Changed param from TreeMap to SortedMap.  Keep running list of
   deleted cells used looking at later versions.  If a cell is 'deleted',
   set the 'filtered' flag to true else scan would not go past the 'deleted'
   row.  If no results found, do not return true (that there are more 
   possible values).  Made the test of chosenTimestamp >= rather than just
   > when checking to see if more (0 may be a legit timestamp).
   (commit): Added a commit override used in unit tests emulating
   batchUpdate operation in HRegionServer.

> [hbase] delete
> --------------
>                 Key: HADOOP-1784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1784
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.15.0
>         Attachments: delete1.patch, delete2.patch, delete3.patch
> Delete is incomplete in hbase.  Whats there is inconsistent.  Deleted records currently
persist and are never cleaned up.  This issue is about making delete behavior coherent across
gets, scans and compaction.
> Below is from a bit of back and forth between Jim and myself where Jim takes a stab at
outlining a model for delete taking inspiration from how Digital's versioned file system used
> {code}
> Let's say you have 5 versions with timestamps T1, T2, ..., T5 where
> timestamps are increasing from T1 to T5 (so T5 is the newest).
> Before any deletes occur, if you don't specify a timestamp and request N
> versions, you should get T5 first, then T4, T3, ... until you have
> reached N or you run out of versions.
> Now add deletes:
> (In the following, timestamp refers to the timestamp associated with
> the delete operation)
> 1. If no timestamp is specified we are deleting the latest version.
>    If a get or scanner specifies that it wants N versions, then it 
>    should get T4, T3, ..., until we have N versions or we run out of
>    older versions. After compaction, the deletion record and T5 should
>    be elided from the HStore.
> 2. If a timestamp is specified and it exactly matches a version (say
>    T4) and a get or scanner requests N versions, then the client
>    receives T5, T3, T2, ... until we satisfy N or run out of versions.
>    After a compaction, the deletion record and T4 should be elided
>    from the HStore.
> 3. If a timestamp is specified and does not exactly match a version,
>    it means delete every version older than this timestamp. If the
>    timestamp is greater than T5 all versions are considered to be
>    deleted and a get or a scanner will return no results even if 
>    the get or scanner specify an older time. This is consistent
>    with the concept of delete all versions older than timestamp.
>    After a compaction, the delete record and all the values should
>    be elided.
>    If the specified timestamp falls between two older versions (say
>    T4 and T3) then T3, T2 and T1 are considered to be deleted (again
>    this is all versions older than timestamp). A get or scanner
>    that specifies no time but requests N versions can only get T5
>    and T4. A get or scanner that requests a time of T3 or earlier
>    will get no results because those versions are deleted. After
>    a compaction, the deletion record and the deleted versions
>    are elided from the HStore.
> {code}

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message