hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12667) KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
Date Tue, 17 Oct 2017 06:18:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207072#comment-16207072

Xiao Chen commented on HDFS-12667:

Thanks [~shahrs87] for creating the jira with good descriptions.

Let me first explain some background of HDFS-11210 for context - perhaps you already know

The problem HDFS-11210 tries to solve is the semantic guarantee a key rollover provides.

Before HDFS-11210, if someone:
# roll a EZ key (from v1 to v2)
# generate EDEK of that EZ key
It is possible to see the generated EDEK is encrypted with the v1 EZ key. If lucky enough,
they could generate, see a v2 EZ key (returned by the synchronized call), then later generate
another, see a v1 EZ key (from the async thread).

Since key rolling is for security purpose, it would be good to ensure EDEKs are generated
with the new version EZ key. Also, re-encryption only makes sense if the versioning guarantee
exists for key rolling. For this reason, we need to keep this semantic.

On the implementation, perhaps the locking could be more sophisticated, and do things smarter
in the during the async fetching phase. Better approaches welcome.

> KMSClientProvider#ValueQueue does synchronous fetch of edeks in background async thread.
> ----------------------------------------------------------------------------------------
>                 Key: HDFS-12667
>                 URL: https://issues.apache.org/jira/browse/HDFS-12667
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: encryption, kms
>    Affects Versions: 3.0.0-alpha4
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
> There are couple of issues in KMSClientProvider#ValueQueue.
> 1.
>  {code:title=ValueQueue.java|borderStyle=solid}
>   private final LoadingCache<String, LinkedBlockingQueue<E>> keyQueues;
>   // Stripped rwlocks based on key name to synchronize the queue from
>   // the sync'ed rw-thread and the background async refill thread.
>   private final List<ReadWriteLock> lockArray =
>       new ArrayList<>(LOCK_ARRAY_SIZE);
> {code}
> It hashes the key name into 16 buckets.
> In the code chunk below,
>  {code:title=ValueQueue.java|borderStyle=solid}
> public List<E> getAtMost(String keyName, int num) throws IOException,
>       ExecutionException {
>      ...
>      ...
>          readLock(keyName);
>         E val = keyQueue.poll();
>         readUnlock(keyName);
>      ...
>   }
>   private void submitRefillTask(final String keyName,
>       final Queue<E> keyQueue) throws InterruptedException {
>               ...
>               ...
>               writeLock(keyName); // It holds the write lock while the key is being asynchronously
fetched. So the read requests for all the keys that hashes to this bucket will essentially
be blocked.
>               try {
>                 if (keyQueue.size() < threshold && !isCanceled()) {
>                   refiller.fillQueueForKey(name, keyQueue,
>                       cacheSize - keyQueue.size());
>                 }
>              ...
>               } finally {
>                 writeUnlock(keyName);
>               }
>             }
>   }
> {code}
> According to above code chunk, if two keys (lets say key1 and key2) hashes to the same
bucket (between 1 and 16), then if key1 is asynchronously being refetched then all the getKey
for key2 will be blocked.
> 2. Due to stripped rw locks, the asynchronous behavior of refill keys is now synchronous
to other handler threads.
> I understand that locks were added so that we don't kick off multiple asynchronous refilling
thread for the same key.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message