Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Date: Thu, 27 Aug 2015 07:27:46 +0000 (UTC)
From: "Arnaud Linz (JIRA)" <jira@apache.org>
To: dev@flink.apache.org
Message-ID: <JIRA.12859527.1440660426000.179680.1440660466383@Atlassian.JIRA>
In-Reply-To: <JIRA.12859527.1440660426000@Atlassian.JIRA>
References: <JIRA.12859527.1440660426000@Atlassian.JIRA>
 <JIRA.12859527.1440660426779@arcas>
Subject: [jira] [Created] (FLINK-2580) HadoopDataOutputStream does not
 expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Arnaud Linz created FLINK-2580:
----------------------------------

             Summary: HadoopDataOutputStream does not expose enough methods=
 of org.apache.hadoop.fs.FSDataOutputStream
                 Key: FLINK-2580
                 URL: https://issues.apache.org/jira/browse/FLINK-2580
             Project: Flink
          Issue Type: Improvement
          Components: Hadoop Compatibility
            Reporter: Arnaud Linz
            Priority: Minor


I=E2=80=99ve noticed that when you use org.apache.flink.core.fs.FileSystem =
to write into a hdfs file, calling org.apache.flink.runtime.fs.hdfs.HadoopF=
ileSystem.create(), it returns a  HadoopDataOutputStream that wraps a org.a=
pache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client=
 .HdfsDataOutputStream wrappper).
=20
However, FSDataOutputStream exposes many methods like flush,   getPos etc, =
but HadoopDataOutputStream only wraps write & close.
=20
For instance, flush() calls the default, empty implementation of OutputStre=
am instead of the hadoop one, and that=E2=80=99s confusing. Moreover, becau=
se of the restrictive OutputStream interface, hsync() and hflush() are not =
exposed to Flink.

I see two options:

- complete the class to wrap all methods of OutputStream and add a getWrapp=
edStream() to access other stuff like hsync().

- get rid of the Hadoop wrapping and directly use Hadoop file system object=
s.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)