flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LINZ, Arnaud" <AL...@bouyguestelecom.fr>
Subject RE: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream
Date Thu, 27 Aug 2015 07:33:06 GMT
Hi,

Ok, I’ve created  FLINK-2580 to track this issue (and FLINK-2579, which is totally unrelated).

I think I’m going to set up my dev environment to start contributing a little more than
just complaining ☺.

Best regards,
Arnaud

De : ewenstephan@gmail.com [mailto:ewenstephan@gmail.com] De la part de Stephan Ewen
Envoyé : mercredi 26 août 2015 20:12
À : user@flink.apache.org
Objet : Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream

I think that is a very good idea.

Originally, we wrapped the Hadoop FS classes for convenience (they were changing, we wanted
to keep the system independent of Hadoop), but these are no longer relevant reasons, in my
opinion.

Let's start with your proposal and see if we can actually get rid of the wrapping in a way
that is friendly to existing users.

Would you open an issue for this?

Greetings,
Stephan


On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud <ALINZ@bouyguestelecom.fr<mailto:ALINZ@bouyguestelecom.fr>>
wrote:
Hi,

I’ve noticed that when you use org.apache.flink.core.fs.FileSystem to write into a hdfs
file, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a  HadoopDataOutputStream
that wraps a org.apache.hadoop.fs.FSDataOutputStream (under its org.apache.hadoop.hdfs.client
.HdfsDataOutputStream wrappper).

However, FSDataOutputStream exposes many methods like flush,   getPos etc, but HadoopDataOutputStream
only wraps write & close.

For instance, flush() calls the default, empty implementation of OutputStream instead of the
hadoop one, and that’s confusing. Moreover, because of the restrictive OutputStream interface,
hsync() and hflush() are not exposed to Flink ; maybe having a getWrappedStream() would be
convenient.

(For now, that prevents me from using Flink FileSystem object, I directly use hadoop’s one).

Regards,
Arnaud





________________________________

L'intégrité de ce message n'étant pas assurée sur internet, la société expéditrice
ne peut être tenue responsable de son contenu ni de ses pièces jointes. Toute utilisation
ou diffusion non autorisée est interdite. Si vous n'êtes pas destinataire de ce message,
merci de le détruire et d'avertir l'expéditeur.

The integrity of this message cannot be guaranteed on the Internet. The company that sent
this message cannot therefore be held liable for its content nor attachments. Any unauthorized
use or dissemination is prohibited. If you are not the intended recipient of this message,
then please delete it and notify the sender.

Mime
View raw message