hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ramkrishna.s.vasudevan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11339) HBase MOB
Date Fri, 25 Jul 2014 10:22:40 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14074262#comment-14074262

ramkrishna.s.vasudevan commented on HBASE-11339:

Bulk loading mob files is what was discussed in internal discussions and why use table.put()
in the sweep tool.  Using table.put is again flushing the data to the memstore and internally
causes the flushes to happen thus affecting the write path of the system.
Bulk loading mob is possible and it should work fine considering HBASE-6630 available where
the bulk loaded files are also assigned with a sequence number and the same sequence number
can be used to resolve a conflict in case the keyvalueheap finds two cells with same row,
ts but different values.  
In our case of sweep tool one thing to note is that by using this tool we are trying to create
a new store file for a same row, ts, cf, cq cell but update it with a new value. Here the
new value is that of the new path that we are generating after the sweep tool merges some
of the mob data into one single file.
So consider in our case row1, cf,c1, ts1 = path1.  The above data is written in Storefile
The updated path is path 2 and so we try to bulk load that new info into a new store file
row1,cf1,c1,ts1 = path2.  Now the HFile containing the new value is bulk loaded into the system
and we try to scan for row1.
What we would expect is to  get the cell with path2 as the value and that should come from
the bulk loaded file.
*Does this happen - Yes in case of 0.96 - No in case of 0.98+* .
In 0.96 case the compacted file will have kvs with mvcc as 0 if the kvs are smaller than the
smallest read point. So in case where a scanner is opened after a set of files have been compacted
all the kvs will have mvcc = 0 in it.
In 0.98+ above that is not the case because 
    long oldestHFileTimeStampToKeepMVCC = System.currentTimeMillis() - 
      (1000L * 60 * 60 * 24 * this.keepSeqIdPeriod);  

    for (StoreFile file : filesToCompact) {
      if(allFiles && (file.getModificationTimeStamp() < oldestHFileTimeStampToKeepMVCC))
        // when isAllFiles is true, all files are compacted so we can calculate the smallest

        // MVCC value to keep
        if(fd.minSeqIdToKeep < file.getMaxMemstoreTS()) {
          fd.minSeqIdToKeep = file.getMaxMemstoreTS();
And so the performCompaction()
        KeyValue kv = KeyValueUtil.ensureKeyValue(c);
        if (cleanSeqId && kv.getSequenceId() <= smallestReadPoint) {
is not able to setSeqId to 0 as atleast for 5 days we expect the value to be retained. 
Remember that in the above case we are assigning seq numbers to bulk loaded files also and
the case there is that when the scanner starts the bulk loaded file is having the highest
seq id and that is ensured by using HFileOutputFormat2 which writes 
So on opening the reader for this bulk loaded store file we are able to get the sequence id.
    if (isBulkLoadResult()){
      // generate the sequenceId from the fileName
      // fileName is of the form <randomName>_SeqId_<id-when-loaded>_
      String fileName = this.getPath().getName();
      int startPos = fileName.indexOf("SeqId_");
      if (startPos != -1) {
        this.sequenceid = Long.parseLong(fileName.substring(startPos + 6,
            fileName.indexOf('_', startPos + 6)));
        // Handle reference files as done above.
        if (fileInfo.isTopReference()) {
          this.sequenceid += 1;
Now when the scanner tries to read from the above two files which has same cells in it for
row1,cf,c1,ts1 but with path1 and path 2 as the values, the mvcc in the compacted store files
that has path 1 (is a non-zero positive value) in 0.98+ and 0 in 0.96 case) and the mvcc for
the KV in the store file generated by bulk load will have 0 in it (both 0.98+ and 0.96).
In KeyValueHeap.java
    public int compare(KeyValueScanner left, KeyValueScanner right) {
      int comparison = compare(left.peek(), right.peek());
      if (comparison != 0) {
        return comparison;
      } else {
        // Since both the keys are exactly the same, we break the tie in favor
        // of the key which came latest.
        long leftSequenceID = left.getSequenceID();
        long rightSequenceID = right.getSequenceID();
        if (leftSequenceID > rightSequenceID) {
          return -1;
        } else if (leftSequenceID < rightSequenceID) {
          return 1;
        } else {
          return 0;
In 0.96 when the scanner tries to compare the different StoreFileScanner to retrieve from
which file the scan has to happen, the if condition will give a '0' because the KV will have
all items same -  row1,cf,c1,ts1 and mvcc =0.
So it tries to get the reader's sequence id (else part of the code) and in the above case
the bulk loaded file has the highest sequence id and so that row1,cf1,c1,ts1 with path2 is
the KV that is returned.

In 0.98 case since the mvcc of the kv in the compacted file is a non-zero value we always
tend to return the compacted file and so the result would be row1,cf1,c1,ts1 with path1.
So this is a behavioral change between 0.96 and 0.98 and also considering that the seq id
of the bulk loaded file is higher than the compacted file it makes sense to read from the
bulk loaded file than the compacted file as it is the newest value. If this is an issue we
can raise a JIRA and find a soln for it. Correct me if am wrong.  Feedback appreciated.

> HBase MOB
> ---------
>                 Key: HBASE-11339
>                 URL: https://issues.apache.org/jira/browse/HBASE-11339
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver, Scanners
>            Reporter: Jingcheng Du
>            Assignee: Jingcheng Du
>         Attachments: HBase MOB Design-v2.pdf, HBase MOB Design.pdf, MOB user guide.docx,
>   It's quite useful to save the medium binary data like images, documents into Apache
HBase. Unfortunately directly saving the binary MOB(medium object) to HBase leads to a worse
performance since the frequent split and compaction.
>   In this design, the MOB data are stored in an more efficient way, which keeps a high
write/read performance and guarantees the data consistency in Apache HBase.

This message was sent by Atlassian JIRA

View raw message