flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arnaud Linz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2580) HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
Date Thu, 27 Aug 2015 07:27:46 GMT
Arnaud Linz created FLINK-2580:
----------------------------------

             Summary: HadoopDataOutputStream does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
                 Key: FLINK-2580
                 URL: https://issues.apache.org/jira/browse/FLINK-2580
             Project: Flink
          Issue Type: Improvement
          Components: Hadoop Compatibility
            Reporter: Arnaud Linz
            Priority: Minor


I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs
file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  HadoopDataOutputStream
that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client
.HdfsDataOutputStream wrappper).
 
However, FSDataOutputStream exposes many methods like flush,   getPos etc, but HadoopDataOutputStream
only wraps write & close.
 
For instance, flush() calls the default, empty implementation of OutputStream instead of the
hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface,
hsync() and hflush() are not exposed to Flink.

I see two options:

- complete the class to wrap all methods of OutputStream and add a getWrappedStream() to access
other stuff like hsync().

- get rid of the Hadoop wrapping and directly use Hadoop file system objects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message