hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "churro morales (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15321) Ability to open a HRegion from hdfs snapshot.
Date Mon, 29 Feb 2016 18:54:18 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15172375#comment-15172375
] 

churro morales commented on HBASE-15321:
----------------------------------------

Use case: 

Jobs made regionservers slow. Slow regionservers made jobs slow. 

Jobs took up quite a bit of regionserver resources, eg: RS heap, handlers, etc...  We had
jobs that did full table scans over a really large table, with lots of regions and store files.
 Hbase snapshots were quite slow on our large cluster (even with skip flush and manifests)
they took around 20 minutes to snapshot this table. This cluster was also taking quite a bit
of writes and serving random reads so the main goal being to reduce the influence these jobs
had on cluster resources    

Hdfs snapshots are O(1) operations.  Thus for our jobs, we took a snapshot in setup, ran the
job over the hdfs snapshot and then deleted the snapshot after the job completed.

If the job can afford to have a latency of (Now - hbase.regionserver.optionalcacheflushinterval)
for your job, M/R over hdfs snapshots is a good option.

This improved the speed at which the jobs completed as well as reduced the resources being
consumed from hbase on our cluster.


> Ability to open a HRegion from hdfs snapshot.
> ---------------------------------------------
>
>                 Key: HBASE-15321
>                 URL: https://issues.apache.org/jira/browse/HBASE-15321
>             Project: HBase
>          Issue Type: New Feature
>    Affects Versions: 2.0.0
>            Reporter: churro morales
>             Fix For: 2.0.0
>
>         Attachments: HBASE-15321-v1.patch, HBASE-15321-v2.patch, HBASE-15321-v3.patch,
HBASE-15321.patch
>
>
> Now that hdfs snapshots are here, we started to run our mapreduce jobs over hdfs snapshots.
 The thing is, hdfs snapshots are read-only point-in-time copies of the file system.  Thus
we had to modify the section of code that initialized the region internals in HRegion.   We
have to skip cleanup of certain directories if the HRegion is backed by a hdfs snapshot. 
I have a patch for trunk with some basic tests if folks are interested.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message