hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1784) [hbase] delete
Date Sat, 08 Sep 2007 09:38:29 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

stack updated HADOOP-1784:
--------------------------

    Attachment: delete2.patch

Here's a patch to finish the delete work.  TestScanner2 is not passing.  Need to investigate...
{code}
HADOOP-1784 delete
Fix scanners and gets so they work properly in presence of deletes.
Added a deleteAll to remove all cells equal to or older than passed
timestamp.  Fixed compaction so deleted cells do not make it into
compacted output.

M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/HBaseTestCase.java
    (Loader): Renamed 'Loader' Interface as 'Incommon' because now does
    more than 'loading'.  Added getters, delete, deleteAll and scanners
    and amended implementations for HTable and HRegion.
    (createTableDescriptor): Add override so can specifice column versions.
    (FlushCache): Added an interface that can be implemented by things
    that flush their cache (e.g. HRegion and HTable).
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java
    (flushcache): Added.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestTimestamp.java
    (testDelete, testTimestampScanning): Refactored so tests could be
    run from client side inside testTimestamps.
    (doTestDelete, assertOnlyLatest, assertVersions, 
      doTestTimestampScanning, assertScanContentTimestamp, put,
      delete): Added.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/MultiRegionTable.java
    Renamed Loader interface as Incommon.
M  src/contrib/hbase/src/test/org/apache/hadoop/hbase/TestCompaction.java
    Add assertions that on compaction deleted rows are dropped and that
    versions > than column maximum versions are also dropped.
    (setUp, tearDown): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HStore.java
    (getFilesToCompact): Changed so list returned newest to oldest.
    Was doing oldest to newest.
    (compact): Keep running list per row of whats been deleted. Used checking
    later encountered cells.  If key matches deleted cell seen earlier, the
    later cell is not added to compacted output.
    (isDeleted): Added.  Used while getting and scanning store files
    in case there is a delete of a specific cell over in memcache.
    (get): Keep running list per row of whats been deleted. Used checking
    later encountered cells.  If later key matches a deleted cell, the cell
    is not returned.  Also, consult memcache.  Memcache could have a delete
    for the record-to-return.
    (hasEnoughVersions, getKeys): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionServer.java
    (deleteAll): Added.
M src/contrib/hbase/src/java/org/apache/hadoop/hbase/HScannerInterface.java
    javadoc.  Changed next param TreeMap to SortedMap so could have
    HInternalScannerInterface inherit from this base..
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
   (EMPTY_TEXT): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HTable.java
    javadoc.
    (deleteAll): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMemcache.java
    (getKeys, isDeleted): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegionInterface.java
    (deleteAll): Added.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HInternalScannerInterface.java
    Made it inherit from HScannerInterface.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HAbstractScanner.java
    (next): Changed param from TreeMap to SortedMap.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java
    (equals): Allow passing byte [].
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/BatchOperation.java
    Redid batch operations as enums.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/io/BatchUpdate.java
    javadoc.
M  src/contrib/hbase/src/java/org/apache/hadoop/hbase/HRegion.java
   (get): Fix count of versions across memcache and stores.  We were not
   doing version count properly.  Also fix so properly ordered.
   (next): Changed param from TreeMap to SortedMap.  Keep running list of
   deleted cells used looking at later versions.
{code}

> [hbase] delete
> --------------
>
>                 Key: HADOOP-1784
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1784
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: stack
>            Assignee: stack
>         Attachments: delete1.patch, delete2.patch
>
>
> Delete is incomplete in hbase.  Whats there is inconsistent.  Deleted records currently
persist and are never cleaned up.  This issue is about making delete behavior coherent across
gets, scans and compaction.
> Below is from a bit of back and forth between Jim and myself where Jim takes a stab at
outlining a model for delete taking inspiration from how Digital's versioned file system used
work:
> {code}
> Let's say you have 5 versions with timestamps T1, T2, ..., T5 where
> timestamps are increasing from T1 to T5 (so T5 is the newest).
> Before any deletes occur, if you don't specify a timestamp and request N
> versions, you should get T5 first, then T4, T3, ... until you have
> reached N or you run out of versions.
> Now add deletes:
> (In the following, timestamp refers to the timestamp associated with
> the delete operation)
> 1. If no timestamp is specified we are deleting the latest version.
>    If a get or scanner specifies that it wants N versions, then it 
>    should get T4, T3, ..., until we have N versions or we run out of
>    older versions. After compaction, the deletion record and T5 should
>    be elided from the HStore.
> 2. If a timestamp is specified and it exactly matches a version (say
>    T4) and a get or scanner requests N versions, then the client
>    receives T5, T3, T2, ... until we satisfy N or run out of versions.
>    After a compaction, the deletion record and T4 should be elided
>    from the HStore.
> 3. If a timestamp is specified and does not exactly match a version,
>    it means delete every version older than this timestamp. If the
>    timestamp is greater than T5 all versions are considered to be
>    deleted and a get or a scanner will return no results even if 
>    the get or scanner specify an older time. This is consistent
>    with the concept of delete all versions older than timestamp.
>    After a compaction, the delete record and all the values should
>    be elided.
>    If the specified timestamp falls between two older versions (say
>    T4 and T3) then T3, T2 and T1 are considered to be deleted (again
>    this is all versions older than timestamp). A get or scanner
>    that specifies no time but requests N versions can only get T5
>    and T4. A get or scanner that requests a time of T3 or earlier
>    will get no results because those versions are deleted. After
>    a compaction, the deletion record and the deleted versions
>    are elided from the HStore.
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message