hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huaxiang sun (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16578) Mob data loss after mob compaction and normal compcation
Date Fri, 14 Oct 2016 19:39:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15576287#comment-15576287

huaxiang sun commented on HBASE-16578:

Hi [~jingcheng.du@intel.com], thanks for the reply. I did not do enough thinking yesterday.
The case I described is invalid as you mentioned that the compacted new reference file will
get a bigger seqId.

You patch looks good to me so + 1 from me.

Looking through the code, I found that it is possible for the following sequence which could
cause an issue. 

1. put mob cell r1, flush, it will create ref1 and mobFile1.
2. put mob cell r2, flush, it will create ref2 and mobFile2.
3. put normal cell r3, do not flush.
4. mob compact, it will flush r3 to hfile1 and create a new reference file.
   In this case, the maxSeqId in hfile1 is same as the seqId in the new reference file, let's
say it is 10
5. Since in step 4, flush happens before bulkload hfile. After flush, compaction may kick
in and compacts ref1, ref2, hfile1 into hfile2 (with maxSeqId to be 10).
6. bulkloaded hfile finishes and it creates *_seqId_10_.
7. In this case, references  in hfile2 and *_seqId_10 may mess up.

I think we need to change the following line:
it needs to be applied to mob bulkloaded file as well to avoid the case.

> Mob data loss after mob compaction and normal compcation
> --------------------------------------------------------
>                 Key: HBASE-16578
>                 URL: https://issues.apache.org/jira/browse/HBASE-16578
>             Project: HBase
>          Issue Type: Bug
>          Components: mob
>    Affects Versions: 2.0.0
>            Reporter: huaxiang sun
>            Assignee: Jingcheng Du
>         Attachments: HBASE-16578-V2.patch, HBASE-16578.patch, TestMobCompaction.java,
> StoreFileScanners on MOB cells rely on the scannerOrder to find the latest cells after
mob compaction. The value of scannerOrder is assigned by the order of maxSeqId of StoreFile,
and this maxSeqId is valued only after the reader of the StoreFile is created.
> In {{Compactor.compact}}, the compacted store files are cloned and their readers are
not created. And in {{StoreFileScanner.getScannersForStoreFiles}} the StoreFiles are sorted
before the readers are created and at that time the maxSeqId for each file is -1 (the default
value). This will lead  to a chaos in scanners in the following normal compaction. Some older
cells might be chosen during the normal compaction.
> We need to create readers either before the sorting in the method {{StoreFileScanner.getScannersForStoreFiles}},
or create readers just after the store files are cloned in {{Compactor.compact}}.

This message was sent by Atlassian JIRA

View raw message