Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 802DC1826E for ; Wed, 26 Aug 2015 18:21:51 +0000 (UTC) Received: (qmail 37879 invoked by uid 500); 26 Aug 2015 18:21:51 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 37806 invoked by uid 500); 26 Aug 2015 18:21:51 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 37796 invoked by uid 99); 26 Aug 2015 18:21:51 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Aug 2015 18:21:51 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id DC3DAEE4A7 for ; Wed, 26 Aug 2015 18:21:50 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 4.228 X-Spam-Level: **** X-Spam-Status: No, score=4.228 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.027, HTML_MESSAGE=3, KAM_LINEPADDING=1.2, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id A29QStr2-9At for ; Wed, 26 Aug 2015 18:21:42 +0000 (UTC) Received: from mail-vk0-f47.google.com (mail-vk0-f47.google.com [209.85.213.47]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 774653822D for ; Wed, 26 Aug 2015 18:12:09 +0000 (UTC) Received: by vkm66 with SMTP id 66so93323693vkm.1 for ; Wed, 26 Aug 2015 11:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:content-type; bh=+rmumzUj+gr/nuCueXCUQOk9FAOE4nzblfeq8aLaTB8=; b=pEen1XqHFHAdI+zWEqttfDDNbQt5UMdte1FxOGL+rUPuR+IOGmCzOxFBwaumRcTyBE ZE09pYXE530YE7JZg05E+X7fdrjMVa/0PlzybLBzTizBNA5m4S6TGGTqXMBa7jYDEKv3 KiERyh3dnbv1URWhqhxPuNAAuEm4/sw1zYL13mVaeZJyFjQR5mUfqLxiEfk2xCqH7kdm fwVyqYLqbSheYOZ0TMAnGIMAPSrJ04uTKso6TjN/uT+Yzy2eRPfF2iMKVRItZxsvQ4lq o1pHw6OfCpBOotZA4v+DGhxvTqasMJfFQur3ZHZFM1VlFdg+Ea7jGsIDJXs1Phu3ksKw 4kyw== MIME-Version: 1.0 X-Received: by 10.52.176.71 with SMTP id cg7mr45239360vdc.28.1440612722326; Wed, 26 Aug 2015 11:12:02 -0700 (PDT) Sender: ewenstephan@gmail.com Received: by 10.31.128.19 with HTTP; Wed, 26 Aug 2015 11:12:02 -0700 (PDT) In-Reply-To: References: Date: Wed, 26 Aug 2015 20:12:02 +0200 X-Google-Sender-Auth: z_TemvcI8d_LnflqjOWGIk9xT2g Message-ID: Subject: Re: HadoopDataOutputStream maybe does not expose enough methods of org.apache.hadoop.fs.FSDataOutputStream From: Stephan Ewen To: user@flink.apache.org Content-Type: multipart/alternative; boundary=20cf307f386a92cd59051e3ac96f --20cf307f386a92cd59051e3ac96f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I think that is a very good idea. Originally, we wrapped the Hadoop FS classes for convenience (they were changing, we wanted to keep the system independent of Hadoop), but these are no longer relevant reasons, in my opinion. Let's start with your proposal and see if we can actually get rid of the wrapping in a way that is friendly to existing users. Would you open an issue for this? Greetings, Stephan On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud wrote: > Hi, > > > > I=E2=80=99ve noticed that when you use org.apache.flink.core.fs.FileSyste= m to > write into a hdfs file, calling > org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a > HadoopDataOutputStream that wraps a > org.apache.hadoop.fs.FSDataOutputStream (under its > org.apache.hadoop.hdfs.client .HdfsDataOutputStream wrappper). > > > > However, FSDataOutputStream exposes many methods like flush, getPos etc= , > but HadoopDataOutputStream only wraps write & close. > > > > For instance, flush() calls the default, empty implementation of > OutputStream instead of the hadoop one, and that=E2=80=99s confusing. Mor= eover, > because of the restrictive OutputStream interface, hsync() and hflush() a= re > not exposed to Flink ; maybe having a getWrappedStream() would be > convenient. > > > > (For now, that prevents me from using Flink FileSystem object, I directly > use hadoop=E2=80=99s one). > > > > Regards, > > Arnaud > > > > > > > > > > ------------------------------ > > L'int=C3=A9grit=C3=A9 de ce message n'=C3=A9tant pas assur=C3=A9e sur int= ernet, la soci=C3=A9t=C3=A9 > exp=C3=A9ditrice ne peut =C3=AAtre tenue responsable de son contenu ni de= ses pi=C3=A8ces > jointes. Toute utilisation ou diffusion non autoris=C3=A9e est interdite.= Si > vous n'=C3=AAtes pas destinataire de ce message, merci de le d=C3=A9truir= e et > d'avertir l'exp=C3=A9diteur. > > The integrity of this message cannot be guaranteed on the Internet. The > company that sent this message cannot therefore be held liable for its > content nor attachments. Any unauthorized use or dissemination is > prohibited. If you are not the intended recipient of this message, then > please delete it and notify the sender. > --20cf307f386a92cd59051e3ac96f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I think that is a very good idea.

Origi= nally, we wrapped the Hadoop FS classes for convenience (they were changing= , we wanted to keep the system independent of Hadoop), but these are no lon= ger relevant reasons, in my opinion.

Let's sta= rt with your proposal and see if we can actually get rid of the wrapping in= a way that is friendly to existing users.

Would y= ou open an issue for this?

Greetings,
St= ephan


On Wed, Aug 26, 2015 at 6:23 PM, LINZ, Arnaud <ALI= NZ@bouyguestelecom.fr> wrote:

Hi,

=C2=A0

I=E2=80=99ve noticed t= hat when you use org.apache.flink.core.fs.FileSystem to write into a hdfs f= ile, calling org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.create(), it returns a =C2=A0HadoopDataOutputStream that wraps a org.apache.hadoop.f= s.FSDataOutputStream (under its org.apache.hadoop.hdfs.client .HdfsDataOutp= utStream wrappper).

=C2=A0

However, FSDataOutputS= tream exposes many methods like flush, =C2=A0=C2=A0getPos etc, but HadoopDa= taOutputStream only wraps write & close.

=C2=A0

For instance, flush() = calls the default, empty implementation of OutputStream instead of the hado= op one, and that=E2=80=99s confusing. Moreover, because of the restrictive OutputStream interface, hsync() and hflush() are not ex= posed to Flink ; maybe having a getWrappedStream() would be convenient.<= /u>

=C2=A0

(For now, that prevent= s me from using Flink FileSystem object, I directly use hadoop=E2=80=99s on= e).

=C2=A0

Regards,=

Arnaud

=C2=A0

=C2=A0

=C2=A0

=C2=A0




L'int=C3=A9grit=C3=A9 de ce message n'=C3=A9tant pas assur=C3=A9e s= ur internet, la soci=C3=A9t=C3=A9 exp=C3=A9ditrice ne peut =C3=AAtre tenue = responsable de son contenu ni de ses pi=C3=A8ces jointes. Toute utilisation= ou diffusion non autoris=C3=A9e est interdite. Si vous n'=C3=AAtes pas= destinataire de ce message, merci de le d=C3=A9truire et d'avertir l'exp=C3=A9diteur.

The integrity of this message cannot be guaranteed on the Internet. The com= pany that sent this message cannot therefore be held liable for its content= nor attachments. Any unauthorized use or dissemination is prohibited. If y= ou are not the intended recipient of this message, then please delete it and notify the sender.

--20cf307f386a92cd59051e3ac96f--