hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xing Shi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-6195) Increment data will be lost when the memstore is flushed
Date Mon, 25 Jun 2012 14:34:43 GMT

    [ https://issues.apache.org/jira/browse/HBASE-6195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13400505#comment-13400505
] 

Xing Shi commented on HBASE-6195:
---------------------------------

@Ted and @ram:
This problem will simply occur when one KeyValue have same row, family, qualifier, timestamp
and different memstoreTS.

There are losts of optimisation for memstoreTS for storage:

1. The flush will set memstoreTS to 0 not just Increment but Put, code in Store.internalFlushCache():
{code}
  if (kv.getMemstoreTS() <= smallestReadPoint) {
    // let us not change the original KV. It could be in the memstore
    // changing its memstoreTS could affect other threads/scanners.
    kv = kv.shallowCopy();
    kv.setMemstoreTS(0);
  }
{code}
If the versions of the same row with same TimeStamp flushed to StoreFiles, the get will choose
the latest version by
{code}
// Negate this comparison so later edits show up first
      return -Longs.compare(left.getMemstoreTS(), right.getMemstoreTS());
{code}

Because the TimeStamps(in one millionsecond) and memstoreTSs are all the same(0) in StoreFiles,
so we didn't know which one is the newest.

2. Besides this, in StoreFileScanner, there is an optimisation in HBASE-4346(code through
HBASE-2856)
{code}
    if (cur.getMemstoreTS() <= readPoint) {
      cur.setMemstoreTS(0);
    }
{code}

So, even though we set memstoreTS progressively increases when Increment(memstoreTS will always
0) or Put, if we flushed two records(all the same excepts memstoreTS, sf1.row.memstoreTS <
sf2.row.memstoreTS) into two StoreFiles. The memstoreTSs will also be set to 0, and we may
got the old record sf1.row


3. Why I can't get all the records for different memstoreTS?
In the Scanner, the ExplicitColumnTracker will be used for tracking. And there are such code
in ExplicitColumnTracker.checkColumn():
{code}
  //If column matches, check if it is a duplicate timestamp
  if (sameAsPreviousTS(timestamp)) {
    //If duplicate, skip this Key
    return ScanQueryMatcher.MatchCode.SKIP;
  }
{code}

So the Get returns just one result although they are different for memstoreTS.

4. How to resolve this?
There are some optimization through the memstoreTS makes the solution complex, I still don't
find a solution for this problem and still thinking how to, may be remove some optimization.
                
> Increment data will be lost when the memstore is flushed
> --------------------------------------------------------
>
>                 Key: HBASE-6195
>                 URL: https://issues.apache.org/jira/browse/HBASE-6195
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Xing Shi
>            Assignee: ShiXing
>             Fix For: 0.96.0, 0.94.1
>
>         Attachments: 6195-trunk-V7.patch, 6195.addendum, HBASE-6195-trunk-V2.patch, HBASE-6195-trunk-V3.patch,
HBASE-6195-trunk-V4.patch, HBASE-6195-trunk-V5.patch, HBASE-6195-trunk-V6.patch, HBASE-6195-trunk.patch
>
>
> There are two problems in increment() now:
> First:
> I see that the timestamp(the variable now) in HRegion's Increment() is generated before
got the rowLock, so when there are multi-thread increment the same row, although it generate
earlier, it may got the lock later. Because increment just store one version, so till now,
the result will still be right.
> When the region is flushing, these increment will read the kv from snapshot and memstore
with whose timestamp is larger, and write it back to memstore. If the snapshot's timestamp
larger than the memstore, the increment will got the old data and then do the increment, it's
wrong.
> Secondly:
> Also there is a risk in increment. Because it writes the memstore first and then HLog,
so if it writes HLog failed, the client will also read the incremented value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message