hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14512) WASB atomic rename should not throw exception if the file is neither in src nor in dst when doing the rename
Date Thu, 15 Jun 2017 20:37:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051023#comment-16051023
] 

Mingliang Liu commented on HADOOP-14512:
----------------------------------------

Steve, sorry for the late report. I run all the unit and live tests against us west. All pass.
It's good convention that we post test report before commit.

{code}
hadoop-tools/hadoop-azure $ mvn test -q

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractAppend
Tests run: 5, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 27.74 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractAppend
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractCreate
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 30.296 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractCreate
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDelete
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 27.426 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDelete
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 210.658 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractDistCp
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractGetFileStatus
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.542 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractGetFileStatus
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractMkdir
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 87.217 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractMkdir
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractOpen
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 23.406 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractOpen
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractRename
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 29.704 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractRename
Running org.apache.hadoop.fs.azure.contract.TestAzureNativeContractSeek
Tests run: 18, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.787 sec - in org.apache.hadoop.fs.azure.contract.TestAzureNativeContractSeek
Running org.apache.hadoop.fs.azure.metrics.TestAzureFileSystemInstrumentation
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 142.165 sec - in org.apache.hadoop.fs.azure.metrics.TestAzureFileSystemInstrumentation
Running org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.379 sec - in org.apache.hadoop.fs.azure.metrics.TestBandwidthGaugeUpdater
Running org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.734 sec - in org.apache.hadoop.fs.azure.metrics.TestNativeAzureFileSystemMetricsSystem
Running org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.217 sec - in org.apache.hadoop.fs.azure.metrics.TestRollingWindowAverage
Running org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 39.986 sec - in org.apache.hadoop.fs.azure.TestAzureConcurrentOutOfBandIo
Running org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.949 sec - in org.apache.hadoop.fs.azure.TestAzureFileSystemErrorConditions
Running org.apache.hadoop.fs.azure.TestBlobDataValidation
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.96 sec - in org.apache.hadoop.fs.azure.TestBlobDataValidation
Running org.apache.hadoop.fs.azure.TestBlobMetadata
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.786 sec - in org.apache.hadoop.fs.azure.TestBlobMetadata
Running org.apache.hadoop.fs.azure.TestBlobTypeSpeedDifference
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 10.63 sec - in org.apache.hadoop.fs.azure.TestBlobTypeSpeedDifference
Running org.apache.hadoop.fs.azure.TestContainerChecks
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.44 sec - in org.apache.hadoop.fs.azure.TestContainerChecks
Running org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 47.314 sec - in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionHandling
Running org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionMessage
Tests run: 47, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 373.217 sec - in org.apache.hadoop.fs.azure.TestFileSystemOperationExceptionMessage
Running org.apache.hadoop.fs.azure.TestFileSystemOperationsExceptionHandlingMultiThreaded
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 36.973 sec - in org.apache.hadoop.fs.azure.TestFileSystemOperationsExceptionHandlingMultiThreaded
Running org.apache.hadoop.fs.azure.TestFileSystemOperationsWithThreads
Tests run: 19, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 399.735 sec - in org.apache.hadoop.fs.azure.TestFileSystemOperationsWithThreads
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAppend
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 172.078 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAppend
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAtomicRenameDirList
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 7.393 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAtomicRenameDirList
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization
Tests run: 21, Failures: 0, Errors: 0, Skipped: 21, Time elapsed: 6.24 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorization
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorizationWithOwner
Tests run: 24, Failures: 0, Errors: 0, Skipped: 24, Time elapsed: 7.036 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemAuthorizationWithOwner
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemBlockLocations
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.822 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemBlockLocations
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemClientLogging
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.837 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemClientLogging
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.999 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrency
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrencyLive
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.766 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemConcurrencyLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractEmulator
Tests run: 43, Failures: 0, Errors: 0, Skipped: 43, Time elapsed: 0.432 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractEmulator
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractLive
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 208.667 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 1.151 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractPageBlobLive
Tests run: 43, Failures: 0, Errors: 0, Skipped: 5, Time elapsed: 218.064 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemContractPageBlobLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.811 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemLive
Tests run: 51, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 431.203 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemLive
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 17.041 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked
Tests run: 50, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.346 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked
Running org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
Tests run: 3, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 0.058 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFileSystemUploadLogic
Running org.apache.hadoop.fs.azure.TestNativeAzureFSPageBlobLive
Tests run: 46, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 443.228 sec - in org.apache.hadoop.fs.azure.TestNativeAzureFSPageBlobLive
Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.843 sec - in org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperations
Running org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperationsLive
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 13.627 sec - in org.apache.hadoop.fs.azure.TestOutOfBandAzureBlobOperationsLive
Running org.apache.hadoop.fs.azure.TestReadAndSeekPageBlobAfterWrite
Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 299.048 sec - in org.apache.hadoop.fs.azure.TestReadAndSeekPageBlobAfterWrite
Running org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
Tests run: 2, Failures: 0, Errors: 0, Skipped: 2, Time elapsed: 0.105 sec - in org.apache.hadoop.fs.azure.TestShellDecryptionKeyProvider
Running org.apache.hadoop.fs.azure.TestWasbFsck
Tests run: 2, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 0.728 sec - in org.apache.hadoop.fs.azure.TestWasbFsck
Running org.apache.hadoop.fs.azure.TestWasbRemoteCallHelper
Tests run: 8, Failures: 0, Errors: 0, Skipped: 8, Time elapsed: 2.237 sec - in org.apache.hadoop.fs.azure.TestWasbRemoteCallHelper
Running org.apache.hadoop.fs.azure.TestWasbUriAndConfiguration
Tests run: 18, Failures: 0, Errors: 0, Skipped: 1, Time elapsed: 10.17 sec - in org.apache.hadoop.fs.azure.TestWasbUriAndConfiguration

Results :

Tests run: 704, Failures: 0, Errors: 0, Skipped: 119
{code}

> WASB atomic rename should not throw exception if the file is neither in src nor in dst
when doing the rename
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14512
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14512
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/azure
>    Affects Versions: 2.8.0
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>             Fix For: 3.0.0-alpha4, 2.8.2
>
>         Attachments: HADOOP-14512.001.patch, HADOOP-14512.002.patch
>
>
> During atomic rename operation, WASB creates a rename pending json file to document which
files need to be renamed and the destination. Then WASB will read this file and rename all
the files one by one.
> There is a recent customer incident in HBase showing a potential bug in the atomic rename
implementation,
> For example, below is a rename pending json file,
> {code}
> {
>   FormatVersion: "1.0",
>   OperationUTCTime: "2017-04-29 06:08:57.465",
>   OldFolderName: "hbase\/data\/default\/abc",
>   NewFolderName: "hbase\/.tmp\/data\/default\/abc",
>   FileList: [
>     ".tabledesc",
>     ".tabledesc\/.tableinfo.0000000001",
>     ".tmp",
>     "08e698e0b7d4132c0456b16dcf3772af",
>     "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
>     "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid",
>     "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
>     "08e698e0b7d4132c0456b16dcf3772af\/0",
>  "08e698e0b7d4132c0456b16dcf3772af\/0\/617294e0737e4d37920e1609cf539a83",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits",
>     "08e698e0b7d4132c0456b16dcf3772af\/recovered.edits\/185.seqid"
>   ]
> }
> {code}  
> When HBase regionserver process (underlying is using WASB driver) was renaming  "08e698e0b7d4132c0456b16dcf3772af\/.regioninfo",
the regionserver process crashed or the VM got rebooted due to system maintenence. When the
regionserver process started running again, it found the rename pending json file and tried
to redo the rename operation. 
> However, when it read the first file ".tabledesc" in the file list, it could not find
this file in src folder and it also could not find the file in destination folder. It could
not find it in src folder because the file had already been renamed/moved to the destination
folder. It could not find it in destination folder because when HBase starts, it will clean
up all the files under /hbase/.tmp.
> The current implementation will throw exceptions saying
> {code}
> else {
>         throw new IOException(
>             "Attempting to complete rename of file " + srcKey + "/" + fileName
>             + " during folder rename redo, and file was not found in source "
>             + "or destination.");
>       }
> {code}
> This will cause HBase HMaster initialization failure and restart HMaster will not work
because the same exception will throw again.
> My proposal is that if during the redo, WASB finds a file not in src and not in dst,
WASB should just skip this file and process the next file rather than throw the error and
let user manually fix it. Reasons are
> 1. Since the rename pending json file contains file A, if the file A is not in src, it
must have been renamed.
> 2. if the file A is not in src and not in dst, the upper layer service must have  removed
it. One thing to note is that during the atomic rename, the folder is locked. So the only
situation the file gets deleted is when VM reboots or service process crashes. When service
process restarts, there might be some operations happening before the atomic rename redo,
like the HBase example above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message