hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Elser (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18135) Track file archival for low latency space quota with snapshots
Date Fri, 02 Jun 2017 22:41:04 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035553#comment-16035553
] 

Josh Elser commented on HBASE-18135:
------------------------------------

This one is a little tricky. The problem is that all of the reporting/tracking of Table sizes
is done on the Region level. For snapshots, we're working at the file level. We definitely
don't want to go tracking files themselves as that's a recipe for bugs. It would be really
miserable code to understand, implement correctly, and maintain.

I'm thinking about the following scenario which will help the "average" case.

1. RS1 compacts R1 from T1: the files \[file1, file2, file3\] into \[file4\].
2. RS1 moves \[file1, file2, file3\] from the data/ directory to the archive/ directory in
HDFS
3. RS1 reports \[file1, file2, file3\] for T1 to the Master (only if T1 has quotas enabled)
4. If T1 has snapshots, for each file in the list reported by RS1, the Master finds the first
Snapshot against T1 that references that file.
5. For each file that the Snapshot references, the snapshot size is updated directly in the
hbase:quota table.

This gets the "visible" quota size updated quickly and avoids interfering with the SnapshotQuotaObserverChore.
When the quota table is updated, the QuotaObserverChore will see the new size, not introducing
another source of latency for quota usage to be updated.

The effort here would be making sure the SnapshotQuotaObserverChore doesn't race against this
hypothetical new process. We might be able to push this down to "HBase" itself (use some kind
of compareAndSet) or just synchronize access in the master via some new class.

> Track file archival for low latency space quota with snapshots
> --------------------------------------------------------------
>
>                 Key: HBASE-18135
>                 URL: https://issues.apache.org/jira/browse/HBASE-18135
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Josh Elser
>            Assignee: Josh Elser
>
> Related to the work proposed on HBASE-17748 and building on the same idea as HBASE-18133,
we can make the space quota tracking for HBase snapshots faster to respond.
> When snapshots are in play, the location of a file (whether in the {{data}} or {{archive}}
directory) plays a factor in the realized size of a table. Like flushes, compactions, etc,
moving files from the data directory to the archive directory is done by the RegionServer.
We can hook into this call and send the necessary information to the Master so that it can
more quickly update the size of a table when there are snapshots in play.
> This will require the RegionServer to report the full coordinates of the file being moved
(table+region+family+file) so that the SnapshotQuotaObserverChore running in the master can
avoid HDFS lookups in partial or total to compute the location of a Region's hfiles.
> This may also require some refactoring of the SnapshotQuotaObserverChore to de-couple
the receipt of these file archival reports from RegionServers (e.g. {{HRegionFileSystem.removeStoreFiles(..)}},
and the Master processing the sizes of snapshots.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message