hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chia-Ping Tsai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18752) Recalculate the TimeRange in flushing snapshot to store file
Date Thu, 05 Oct 2017 14:27:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16192948#comment-16192948

Chia-Ping Tsai commented on HBASE-18752:

bq. So if we have max versions set to 2, then also we don't have any issue right? Still the
time range tracker will be able to mark 101 and 102 in this case correct?
Yes, the test will pass if the max versions set to 2. However, it still fails if we put three(>
2) cells having the same row/fam/qual and different ts. The lowest cell will be dropped in
flush. I added more tests in v1 patch.

bq. Would there be any impact on performance of flushing ?
ya, fixing this bug will impact the performance of flushing.
# we have to retrieve the ts from the cell (ByteBufferedCell)
# we have to recalculate the min/max of TimeRange (The cost is trivial now because we introduce
the non-sync TimeRangeTracker - HBASE-18753)

bq. So in your case there are lot of duplicate records but with diff ts? Something like a
streaming app?
Yep. our data, which are dump from the same time window, have many same fields.

> Recalculate the TimeRange in flushing snapshot to store file
> ------------------------------------------------------------
>                 Key: HBASE-18752
>                 URL: https://issues.apache.org/jira/browse/HBASE-18752
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Chia-Ping Tsai
>            Assignee: Chia-Ping Tsai
>             Fix For: 2.0.0-beta-1
>         Attachments: HBASE-18752.v0.patch
> We drop superfluous cells in flushing, hence the TimeRange from snapshot is inaccurate
for the storefile. We should recalculate the TimeRange for the storefile, but the side-effect
is the extra cost - we need to extract the timestamp from cell (ByteBufferCell).

This message was sent by Atlassian JIRA

View raw message