hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11685) StorageException complaining " no lease ID" during HBase distributed log splitting
Date Thu, 22 Oct 2015 22:52:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970059#comment-14970059
] 

Chris Nauroth commented on HADOOP-11685:
----------------------------------------

[~onpduo], thanks for investigating further and sharing more details.

Considering that the {{mkdirs}} (step 3 in your example) will be followed by creating a file
under that directory, presumably using {{createNonRecursive}} (step 4) and a subsequent {{rename}}
of that file (step 5), is there still a risk of a failure later in the sequence even after
this patch?  It sounds like this patch would get us past step 3, but not necessarily steps
4 or 5.  The thread might see a successful {{mkdir}}, but then immediately fail on those later
steps.

There are no unit tests in the patch.  Has anything else been done to verify the fix?

> StorageException complaining " no lease ID" during HBase distributed log splitting
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-11685
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11685
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools
>            Reporter: Duo Xu
>            Assignee: Duo Xu
>         Attachments: HADOOP-11685.01.patch, HADOOP-11685.02.patch, HADOOP-11685.03.patch
>
>
> This is similar to HADOOP-11523, but in a different place. During HBase distributed log
splitting, multiple threads will access the same folder called "recovered.edits". However,
lots of places in our WASB code did not acquire lease and simply passed null to Azure storage,
which caused this issue.
> {code}
> 2015-02-26 03:21:28,871 WARN org.apache.hadoop.hbase.regionserver.SplitLogWorker: log
splitting of WALs/workernode4.hbaseproddm2001.g6.internal.cloudapp.net,60020,1422071058425-splitting/workernode4.hbaseproddm2001.g6.internal.cloudapp.net%2C60020%2C1422071058425.1424914216773
failed, returning error
> java.io.IOException: org.apache.hadoop.fs.azure.AzureException: java.io.IOException
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.checkForErrors(HLogSplitter.java:633)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.access$000(HLogSplitter.java:121)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$OutputSink.finishWriting(HLogSplitter.java:964)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.finishWritingAndClose(HLogSplitter.java:1019)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:359)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLogFile(HLogSplitter.java:223)
> 	at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:142)
> 	at org.apache.hadoop.hbase.regionserver.handler.HLogSplitterHandler.process(HLogSplitterHandler.java:79)
> 	at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.fs.azure.AzureException: java.io.IOException
> 	at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1477)
> 	at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1862)
> 	at org.apache.hadoop.fs.azurenative.NativeAzureFileSystem.mkdirs(NativeAzureFileSystem.java:1812)
> 	at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getRegionSplitEditsPath(HLogSplitter.java:502)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.createWAP(HLogSplitter.java:1211)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.getWriterAndPath(HLogSplitter.java:1200)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$LogRecoveredEditsOutputSink.append(HLogSplitter.java:1243)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.writeBuffer(HLogSplitter.java:851)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.doRun(HLogSplitter.java:843)
> 	at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter$WriterThread.run(HLogSplitter.java:813)
> Caused by: java.io.IOException
> 	at com.microsoft.windowsazure.storage.core.Utility.initIOException(Utility.java:493)
> 	at com.microsoft.windowsazure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:282)
> 	at org.apache.hadoop.fs.azurenative.AzureNativeFileSystemStore.storeEmptyFolder(AzureNativeFileSystemStore.java:1472)
> 	... 10 more
> Caused by: com.microsoft.windowsazure.storage.StorageException: There is currently a
lease on the blob and no lease ID was specified in the request.
> 	at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:163)
> 	at com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306)
> 	at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229)
> 	at com.microsoft.windowsazure.storage.blob.CloudBlockBlob.commitBlockList(CloudBlockBlob.java:248)
> 	at com.microsoft.windowsazure.storage.blob.BlobOutputStream.commit(BlobOutputStream.java:319)
> 	at com.microsoft.windowsazure.storage.blob.BlobOutputStream.close(BlobOutputStream.java:279)
> 	... 11 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message