hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9405) When starting a file, NameNode should generate EDEK in a separate thread
Date Tue, 08 Mar 2016 04:17:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15184370#comment-15184370

Xiao Chen commented on HDFS-9405:

Thanks all for the discussions and thoughts here. I'd like to work on this.

As I understand, there seems to be 2 problems:
- On NN startup/failover, the first call will trigger the {{LoadingCache}} to fill up, which
happens synchronously.
We may solve this by having a background thread to actively warm up the cache.

- If KMS or the backing key provider is down, all RPCs to create will hang and timeout in
{{FSNamesystem#startFile}} (if cache is empty).
This is arguably a bug. IMHO this should be identified at the service level, instead of depending
on the client RPC to find it.
But if we don't like the hang in the RPC, perhaps in addition to the above background warm
up, we could also update the {{ValueQueue}} to not do a get, but a [getIfPresent|http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/Cache.html#getIfPresent(java.lang.Object)]
instead, and throw {{RetryStartFileException}} directly if nothing cached, under the assumption
that otherwise the cache should have been filled up?

Is my understanding correct?

Will work hard on making the logs/metrics helpful as well.

> When starting a file, NameNode should generate EDEK in a separate thread
> ------------------------------------------------------------------------
>                 Key: HDFS-9405
>                 URL: https://issues.apache.org/jira/browse/HDFS-9405
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: encryption, namenode
>    Affects Versions: 2.7.1
>            Reporter: Zhe Zhang
> {{generateEncryptedDataEncryptionKey}} involves a non-trivial I/O operation to the key
provider, which could be slow or cause timeout. It should be done as a separate thread so
as to return a proper error message to the RPC caller.

This message was sent by Atlassian JIRA

View raw message