hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiao Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14872) CryptoInputStream should implement unbuffer
Date Tue, 17 Oct 2017 20:26:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16208272#comment-16208272
] 

Xiao Chen commented on HADOOP-14872:
------------------------------------

Thanks for the continued work on this John. My belated review, mostly cosmetic:
- We should verify the exception. Also should log the exception using the standard loggers.
{code:title=CryptoStreamsTestBase:testUnbuffer}
    // Test buffered read again after unbuffer
      try {
        // Must seek to the beginning first
        ((Seekable) in).seek(0);
        // Can not read again if seek fails
        readCheck(in);
      } catch (Exception e) {
        System.err.println(e);
      }
{code}
- Suggest to include {{Locale.ENGLISH}} in {{capability.toLowerCase()}}. Steve's above example
also does this. Hadoop's {{StringUtils#toLowerCase}} makes this a little easier to write,
but also conflicts the the apache commons StringUtils in some classes. :(
- From existing code. The javadoc for hsync says 'flush out the data in client's buffer and
the disk device', I feel the {{Syncable}}'s javadoc is more accurate 'flush out the data in
client's user buffer all the way to the disk device (but the disk may have it in its cache).'
We should also update the documentation about this.
- In {{CryptoInputStream}}, replace method name s/cleanDescriptorPool/clearDecryptorPool/g.
- Maybe we should provide a 1-sentence explanation on readahead, dropbehind and unbuffer,
similar to what hsync and hflush have.

> CryptoInputStream should implement unbuffer
> -------------------------------------------
>
>                 Key: HADOOP-14872
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14872
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>    Affects Versions: 2.6.4
>            Reporter: John Zhuge
>            Assignee: John Zhuge
>         Attachments: HADOOP-14872.001.patch, HADOOP-14872.002.patch, HADOOP-14872.003.patch,
HADOOP-14872.004.patch, HADOOP-14872.005.patch, HADOOP-14872.006.patch, HADOOP-14872.007.patch,
HADOOP-14872.008.patch, HADOOP-14872.009.patch, HADOOP-14872.010.patch, HADOOP-14872.011.patch,
HADOOP-14872.012.patch
>
>
> Discovered in IMPALA-5909.
> Opening an encrypted HDFS file returns a chain of wrapped input streams:
> {noformat}
> HdfsDataInputStream
>   CryptoInputStream
>     DFSInputStream
> {noformat}
> If an application such as Impala or HBase calls HdfsDataInputStream#unbuffer, FSDataInputStream#unbuffer
will be called:
> {code:java}
>     try {
>       ((CanUnbuffer)in).unbuffer();
>     } catch (ClassCastException e) {
>       throw new UnsupportedOperationException("this stream does not " +
>           "support unbuffering.");
>     }
> {code}
> If the {{in}} class does not implement CanUnbuffer, UOE will be thrown. If the application
is not careful, tons of UOEs will show up in logs.
> In comparison, opening an non-encrypted HDFS file returns this chain:
> {noformat}
> HdfsDataInputStream
>   DFSInputStream
> {noformat}
> DFSInputStream implements CanUnbuffer.
> It is good for CryptoInputStream to implement CanUnbuffer for 2 reasons:
> * Release buffer, cache, or any other resource when instructed
> * Able to call its wrapped DFSInputStream unbuffer
> * Avoid the UOE described above. Applications may not handle the UOE very well.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message