hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10603) Crypto input and output streams implementing Hadoop stream interfaces
Date Wed, 21 May 2014 01:29:39 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14004190#comment-14004190

Andrew Wang commented on HADOOP-10603:

Hi Yi, thanks for working on this code. I have some review comments based on a slightly old
version of this patch. Throughout, if I've correctly identified a correctness issue, let's
have a corresponding unit test if possible. I'd personally also rather not split the implementation
and tests into separate JIRAs.

Nitty stuff:
* Need class javadoc and interface annotations on all new classes
* Need "<p/>" to actually line break in javadoc
* Some tab characters present

* s/mod/mode
* What does "calIV" mean? Javadoc here would be nice.
* calIV would be simpler if we used ByteBuffer.wrap and {{getLong}}. I think right now, we
also need to cast each value to a {{long}} before shifting, else it only works up to an int.
Would be good to unit test this function.

* Could you define the term "block" in the {{#encrypt}} javadoc?
* I don't understand the reinit conditions, do you mind explaining this a bit? The javadoc
for {{Cipher#update}} indicates that it always fully reads the input buffer, so is the issue
that the cipher sometimes doesn't flush all the input to the output buffer?
* If this API only accepts direct ByteBuffers, we should Precondition check that in the implementation
* Javadoc for {{encrypt}} should link to {{javax.crypto.ShortBufferException}}, not {{#ShortBufferException}}.
I also don't see this being thrown because we wrap everything in an IOException.

* How was the default buffer size of 8KB chosen? This should probably be a new configuration
parameter, or respect io.file.buffer.size.
* Potential for int overflow in {{#write}} where we check {{off+len < 0}}. I also find
this if statement hard to parse, would prefer if it were expanded.
* Is the {{16}} in updateEncryptor something that should be hard-coded? Maybe pull it out
into a constant and javadoc why it's 16. I'm curious if this is dependent on the Encryptor
* We need to be careful with direct BBs, since they don't trigger GC. We should be freeing
them manually when the stream is closed, or pooling them somehow for reuse.
* In {{#process}}, we flip the inBuf, then if there's no data we just return. Shouldn't we
restore inBuf to its previous padded state first? Also, IIUC {{inBuffer.remaining()}} cannot
be less than padding since the inBuffer position does not move backwards, so I'd prefer to
see a Precondition check and {{inBuf.remaining() == padding)}}. Test case would be nice if
I'm right about this.
* Rename {{#process}} to {{#encrypt}}?
* Do we need the special-case logic with tmpBuf? It looks like outBuffer is always direct.
* Do we need to update padding when we do a flush?
* Also in {{#flush}}, s/encryption/encrypt
* oneByte can be final
* If you have extra time, an ASCII art diagram showing how {{padding}} and the stream offset
works would also be nice. Javadoc for the special padding handling would be nice.
* Can make class-private methods private
* Should {{close()}} also close the underlying stream?

* Many of the OutputStream comments apply here too: hardcoded 8KB buffer size, int overflow
for {{off+len}}, hardcoded 16, handling of padding in {{process}}, freeing direct buffers,
private methods, closing underlying stream, etc.
* Do we have tests for wrapping both ByteBufferReadable and not streams?
* Rename {{process()}} to {{decrypt()}}?
* In {{process}}, the catch/throw there doesn't seem useful, since everything that throws
already throws an IOException.
* Positioned read and readFully, doing a seek in the finally will not work on a non-Seekable
stream. There's also no need to catch {{ClassCastException}} since it's already handled in
* {{readFully(long, byte[])}} should just delegate directly to the other readFully method,
it doesn't need to do anything else.
* updateDecryptor doesn't seem to need the {{long offset}} parameter since it's always passed
* We need to return -1 on EOF for zero-byte reads, see HDFS-5762.
* Comment in {{skip}} about why we subtract then add {{outBuffer.remaining()}} would be good.
* Some empty {{throw new UnsupportedOperationException()}} could use text

* s/if initialize fails/if initialization fails/

> Crypto input and output streams implementing Hadoop stream interfaces
> ---------------------------------------------------------------------
>                 Key: HADOOP-10603
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10603
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: security
>    Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
>            Reporter: Alejandro Abdelnur
>            Assignee: Yi Liu
>             Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)
>         Attachments: HADOOP-10603.1.patch, HADOOP-10603.2.patch, HADOOP-10603.3.patch,
HADOOP-10603.4.patch, HADOOP-10603.5.patch, HADOOP-10603.6.patch, HADOOP-10603.7.patch, HADOOP-10603.8.patch,
> A common set of Crypto Input/Output streams. They would be used by CryptoFileSystem,
HDFS encryption, MapReduce intermediate data and spills. Note we cannot use the JDK Cipher
Input/Output streams directly because we need to support the additional interfaces that the
Hadoop FileSystem streams implement (Seekable, PositionedReadable, ByteBufferReadable, HasFileDescriptor,
CanSetDropBehind, CanSetReadahead, HasEnhancedByteBufferAccess, Syncable, CanSetDropBehind).

This message was sent by Atlassian JIRA

View raw message