hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rushabh S Shah (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-12574) Add CryptoInputStream to WebHdfsFileSystem read call.
Date Fri, 08 Dec 2017 20:40:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16284051#comment-16284051
] 

Rushabh S Shah edited comment on HDFS-12574 at 12/8/17 8:39 PM:
----------------------------------------------------------------

Attaching a patch for jenkins to run and point out silly mistakes/checkstyles issues.

{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates a new garbage
array for every invocation. I forget where else I made a change to have a static array assignment
of the values and created a static valueOf to return the item from the static array. I can't
find it, looks like it might have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.

{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in DFSClient#createWrappedInputStream.
CryptoInputStream can work with any general stream so let's make a general wrapping method.
Maybe create an interface something like EncryptableInputStream for the getFileEncryptionInfo
which DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream and it
returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}}, how about
creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and {{DistributedFileSystem}}
implement it.
Just an idea. Let me know if you have pros and cons in that approach.

{quote}
I'm not thrilled with stream construction always calling file info but I understand the stream
is lazily opened which creates a chicken and egg problem for determining whether to return
a crypto stream. 
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called. But by then
it is too late to determine.

{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop issues or partial
stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in case of any Exception.
Please let me know if I missed any case.


{quote}
I think using the cached file status at open in ReadRunner#initializeInputStream subtly changes
semantics. 
{quote}
I retained the old behaviour.

{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{ MiniDfsCluster#restartNamenode}}
it caches the old value of key provider address.

Also note that patch #002 is built on top of HDFS-12907.
Once that gets reviewed and resolved, I will create a new patch with one more added test case.


was (Author: shahrs87):
Attaching a patch for jenkins to run and point out silly mistakes/checkstyles issues.

{quote}
This is bad: CryptoProtocolVersion.values(). The values method always allocates a new garbage
array for every invocation. I forget where else I made a change to have a static array assignment
of the values and created a static valueOf to return the item from the static array. I can't
find it, looks like it might have been undone... Note that protobufs actually do this.
{quote}
Addressed in v2 of patch.

{quote}
WebHdfsFileSystem#open contains a copy-n-paste of the same code in DFSClient#createWrappedInputStream.
CryptoInputStream can work with any general stream so let's make a general wrapping method.
Maybe create an interface something like EncryptableInputStream for the getFileEncryptionInfo
which DFSInputStream and WebHdfsInputStream implements. Pass an encryptable stream and it
returns a wrapped stream if necessary.
{quote}
Addressed in latest patch.
But I am thinking of alternative.
Instead of creating {{EncryptableInputStream}} and {{EncryptableOutputStream}}, how about
creating {{EncryptableStream}} and let {{WebHdfsFileSystem}} and {{DistributedFileSystem}}
implement it.
Just an idea. Let me know if you have pros and cons in that approach.

{quote}
I'm not thrilled with stream construction always calling file info but I understand the stream
is lazily opened which creates a chicken and egg problem for determining whether to return
a crypto stream. 
{quote}
Exactly client connects to namenode when {{InputStream#read}} is being called. But by then
it is too late to determine.

{quote}
Double check that failing in the ReadRunner ctor doesn't cause any retry loop issues or partial
stream leakage. I'll scrutinize too.
{quote}
I added try catch block in {{WebHdfsFileSystem#open}} to close the stream in case of any Exception.
Please let me know if I missed any case.


{quote}
I think using the cached file status at open in ReadRunner#initializeInputStream subtly changes
semantics. 
{quote}
I retained the old behaviour.

{quote}
Why the change to MiniDFSCluster?
{quote}
Since {{NamenodeWebHdfsMethods#serverDefaultsResponse}} is static, so in{{ MiniDfsCluster#restartNamenode}}
it caches the old value of key provider address.

> Add CryptoInputStream to WebHdfsFileSystem read call.
> -----------------------------------------------------
>
>                 Key: HDFS-12574
>                 URL: https://issues.apache.org/jira/browse/HDFS-12574
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: encryption, kms, webhdfs
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>         Attachments: HDFS-12574.001.patch, HDFS-12574.002.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message