hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ajay Jadhav (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18448) Added support for refreshing HFiles through API and shell
Date Wed, 26 Jul 2017 18:20:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102052#comment-16102052

Ajay Jadhav commented on HBASE-18448:

[~ram_krish]: Exposing the refresh hfiles API is useful in the following scenario:
Assuming we have 2 HBase clusters pointing to same rootDir (S3 bucket) out of which one is
in read-only mode (replica) and the other one accepts writes (primary)

1. We issue a "put" on primary cluster and do a flush immediately.
2. This will create an HFile on storage (S3).
3. Replica will not be aware of this newly created HFile as the write didn't go through it.
4. The only way for replica to be consistent with primary is to issue a refresh HFiles on
replica which will
    update the in-memory file handle list for replica.

This is why we need the refresh HFiles API to keep all the clusters consistent with writes
on the primary cluster.

More information about this feature is available here too- https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/

> Added support for refreshing HFiles through API and shell
> ---------------------------------------------------------
>                 Key: HBASE-18448
>                 URL: https://issues.apache.org/jira/browse/HBASE-18448
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.0.0, 1.3.1
>            Reporter: Ajay Jadhav
>            Assignee: Ajay Jadhav
>            Priority: Minor
>             Fix For: 1.4.0
>         Attachments: HBASE-18448.branch-1.001.patch, HBASE-18448.branch-1.002.patch
> In the case where multiple HBase clusters are sharing a common rootDir, even after flushing
the data from
> one cluster doesn't mean that other clusters (replicas) will automatically pick the new
HFile. Through this patch,
> we are exposing the refresh HFiles API which when issued from a replica will update the
in-memory file handle list
> with the newly added file.
> This allows replicas to be consistent with the data written through the primary cluster.

This message was sent by Atlassian JIRA

View raw message