hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10609) Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream applications
Date Fri, 23 Sep 2016 21:18:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15517601#comment-15517601
] 

Andrew Wang commented on HDFS-10609:
------------------------------------

LGTM, that's a lot of code dedupe :) One little nit, rename {{fail}} to {{failed}} or {{recordFailure}}
or {{unsuccessful}}? Calling it just {{fail}} makes me think of JUnit's {{Assert.fail()}}
which exits immediately.

Otherwise +1 pending, thanks for working on this Wei-Chiu!

> Uncaught InvalidEncryptionKeyException during pipeline recovery may abort downstream
applications
> -------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10609
>                 URL: https://issues.apache.org/jira/browse/HDFS-10609
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: encryption
>    Affects Versions: 2.6.0
>         Environment: CDH5.8.0
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>         Attachments: HDFS-10609.001.patch, HDFS-10609.002.patch, HDFS-10609.003.patch,
HDFS-10609.004.patch
>
>
> In normal operations, if SASL negotiation fails due to {{InvalidEncryptionKeyException}},
it is typically a benign exception, which is caught and retried :
> {code:title=SaslDataTransferServer#doSaslHandshake}
>   if (ioe instanceof SaslException &&
>       ioe.getCause() != null &&
>       ioe.getCause() instanceof InvalidEncryptionKeyException) {
>     // This could just be because the client is long-lived and hasn't gotten
>     // a new encryption key from the NN in a while. Upon receiving this
>     // error, the client will get a new encryption key from the NN and retry
>     // connecting to this DN.
>     sendInvalidKeySaslErrorMessage(out, ioe.getCause().getMessage());
>   } 
> {code}
> {code:title=DFSOutputStream.DataStreamer#createBlockOutputStream}
> if (ie instanceof InvalidEncryptionKeyException && refetchEncryptionKey >
0) {
>             DFSClient.LOG.info("Will fetch a new encryption key and retry, " 
>                 + "encryption key was invalid when connecting to "
>                 + nodes[0] + " : " + ie);
> {code}
> However, if the exception is thrown during pipeline recovery, the corresponding code
does not handle it properly, and the exception is spilled out to downstream applications,
such as SOLR, aborting its operation:
> {quote}
> 2016-07-06 12:12:51,992 ERROR org.apache.solr.update.HdfsTransactionLog: Exception closing
tlog.
> org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute
encryption key for nonce, since the required block key (keyID=557709482) doesn't exist. Current
key: 1350592619
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
> 2016-07-06 12:12:51,997 ERROR org.apache.solr.update.CommitTracker: auto commit error...:org.apache.solr.common.SolrException:
org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException: Can't re-compute
encryption key for nonce, since the required block key (keyID=557709482) doesn't exist. Current
key: 1350592619
>         at org.apache.solr.update.HdfsTransactionLog.close(HdfsTransactionLog.java:316)
>         at org.apache.solr.update.TransactionLog.decref(TransactionLog.java:505)
>         at org.apache.solr.update.UpdateLog.addOldLog(UpdateLog.java:380)
>         at org.apache.solr.update.UpdateLog.postCommit(UpdateLog.java:676)
>         at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:623)
>         at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>         at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hdfs.protocol.datatransfer.InvalidEncryptionKeyException:
Can't re-compute encryption key for nonce, since the required block key (keyID=557709482)
doesn't exist. Current key: 1350592619
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.DataTransferSaslUtil.readSaslMessageAndNegotiatedCipherOption(DataTransferSaslUtil.java:417)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.doSaslHandshake(SaslDataTransferClient.java:474)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.getEncryptedStreams(SaslDataTransferClient.java:299)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.send(SaslDataTransferClient.java:242)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.checkTrustAndSend(SaslDataTransferClient.java:211)
>         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferClient.socketSend(SaslDataTransferClient.java:183)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:1308)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:1272)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1433)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:1147)
>         at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:632)
> {quote}
> This exception should be contained within HDFS, caught and retried just like in {{createBlockOutputStream()}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message