hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16788) Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()
Date Tue, 11 Oct 2016 20:45:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566546#comment-15566546

Hudson commented on HBASE-16788:

FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #41 (See [https://builds.apache.org/job/HBase-1.3-JDK8/41/])
HBASE-16788 Guard HFile archiving under a separate lock (garyh: rev 8eea3a5777a25907dcf6486bfeafd8482a072b80)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java
* (add) hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactionArchiveConcurrentClose.java
HBASE-16788 addendum Account for HStore archiveLock in heap size (garyh: rev cd3afa5a0d85751936c54fa2398b63ff2efa128c)
* (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java

> Race in compacted file deletion between HStore close() and closeAndArchiveCompactedFiles()
> ------------------------------------------------------------------------------------------
>                 Key: HBASE-16788
>                 URL: https://issues.apache.org/jira/browse/HBASE-16788
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 1.3.0
>            Reporter: Gary Helmling
>            Assignee: Gary Helmling
>            Priority: Blocker
>             Fix For: 2.0.0, 1.3.0, 1.4.0
>         Attachments: 16788-suggest.v2, HBASE-16788-addendum.patch, HBASE-16788.001.patch,
HBASE-16788.002.patch, HBASE-16788_1.patch
> HBASE-13082 changed the way that compacted files are archived from being done inline
on compaction completion to an async cleanup by the CompactedHFilesDischarger chore.  It looks
like the changes to HStore to support this introduced a race condition in the compacted HFile
> In the following sequence, we can wind up with two separate threads trying to archive
the same HFiles, causing a regionserver abort:
> # compaction completes normally and the compacted files are added to {{compactedfiles}}
in HStore's DefaultStoreFileManager
> # *threadA*: CompactedHFilesDischargeHandler runs in a RS executor service, calling closeAndArchiveCompactedFiles()
> ## obtains HStore readlock
> ## gets a copy of compactedfiles
> ## releases readlock
> # *threadB*: calls HStore.close() as part of region close
> ## obtains HStore writelock
> ## calls DefaultStoreFileManager.clearCompactedfiles(), getting a copy of same compactedfiles
> # *threadA*: calls HStore.removeCompactedfiles(compactedfiles)
> ## archives files in {compactedfiles} in HRegionFileSystem.removeStoreFiles()
> ## call HStore.clearCompactedFiles()
> ## waits on write lock
> # *threadB*: continues with close()
> ## calls removeCompactedfiles(compactedfiles)
> ## calls HRegionFIleSystem.removeStoreFiles() -> HFileArchiver.archiveStoreFiles()
> ## receives FileNotFoundException because the files have already been archived by threadA
> ## throws IOException
> # RS aborts
> I think the combination of fetching the compactedfiles list and removing the files needs
to be covered by locking.  Options I see are:
> * Modify HStore.closeAndArchiveCompactedFiles(): use writelock instead of readlock and
move the call to removeCompactedfiles() inside the lock.  This means the read operations will
be blocked while the files are being archived, which is bad.
> * Synchronize closeAndArchiveCompactedFiles() and modify close() to call it instead of
calling removeCompactedfiles() directly
> * Add a separate lock for compacted files removal and use in closeAndArchiveCompactedFiles()
and close()

This message was sent by Atlassian JIRA

View raw message