hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "churro morales (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-11360) SnapshotFileCache refresh logic based on modified directory time might be insufficient
Date Thu, 19 Jun 2014 17:02:24 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037530#comment-14037530

churro morales commented on HBASE-11360:

So this works most of the time, but this is what made our cleaner slow in the first place.

We ran an export snapshot job for our biggest table Around 300k reference file.  This export
is going took 10 days to complete.  We did not have any snapshots on the destination cluster,
only a snapshot in progress.  

Thus if we refresh the snapshot in .tmp for every archive file we have to clean I think we
will run into the same issue we had before we patched HBASE-11322.  The destination cluster
was being written to and the cleaner could not keep up.  Our dfs became full and our namenode
heap almost burst.

I think doing this for each reference file would work in the usual situations, but if you
export a large snapshot then I believe you could encounter the same problems.

How about this solution but not doing it for every HFile but doing it for batches like the

> SnapshotFileCache refresh logic based on modified directory time might be insufficient
> --------------------------------------------------------------------------------------
>                 Key: HBASE-11360
>                 URL: https://issues.apache.org/jira/browse/HBASE-11360
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.19
>            Reporter: churro morales
> Right now we decide whether to refresh the cache based on the lastModified timestamp
of all the snapshots and those "running" snapshots which is located in the /hbase/.hbase-snapshot/.tmp/<snapshot>
> We ran a ExportSnapshot job which takes around 7 minutes between creating the directory
and copying all the files. 
> Thus the modified time for the 
> /hbase/.hbase-snapshot/.tmp directory was 7 minutes earlier than the modified time of
> /hbase/.hbase-snapshot/.tmp/<snapshot> directory
> Thus the cache refresh happens and doesn't pick up all the files but thinks its up to
date as the modified time of the .tmp directory never changes.
> This is a bug as when the export job starts the cache never contains the files for the
"running" snapshot and will fail.

This message was sent by Atlassian JIRA

View raw message