hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15871) Some input streams does not obey "java.io.InputStream.available" contract
Date Tue, 23 Oct 2018 12:40:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16660580#comment-16660580

Steve Loughran commented on HADOOP-15871:

Extensions to {{hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md}}
welcome; with contract tests...could be bundled with any HADOOP-15870 changes.

# work out what is meant to happen @ java APIs
# look at HDFS to see what it thinks should happen
# spec
# contract tests
# test object stores & patch individually

looking at java.io, available() says "which won't block". For S3A it'd actually be the remaining
amount of data in the current read, so just forward to {{wrappedStream.available()}} if wrappedStream
!= null, else 0. But {{com.amazonaws.services.s3.model.S3ObjectInputStream}}} calls out always
returning 1 here so as not to break {{GZIPInputStream}} (see [https://bugs.java.com/bugdatabase/view_bug.do?bug_id=7036144].

'was this gzip related? If so, something to consider including in a test too.

> Some input streams does not obey "java.io.InputStream.available" contract 
> --------------------------------------------------------------------------
>                 Key: HADOOP-15871
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15871
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs, fs/s3
>            Reporter: Shixiong Zhu
>            Priority: Major
> E.g,  DFSInputStream  and S3AInputStream return the size of the remaining available bytes,
but the javadoc of "available" says it should "Returns an estimate of the number of bytes
that can be read (or skipped over) from this input stream *without blocking* by the next invocation
of a method for this input stream."
> I understand that some applications may rely on the current behavior. It would be great
that there is an interface to document how "available" should be implemented.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message