Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BCCF0186B1 for ; Thu, 30 Jul 2015 12:18:38 +0000 (UTC) Received: (qmail 21436 invoked by uid 500); 30 Jul 2015 12:18:29 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 21364 invoked by uid 500); 30 Jul 2015 12:18:29 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 21352 invoked by uid 99); 30 Jul 2015 12:18:28 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 30 Jul 2015 12:18:28 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 5405D1A8C03 for ; Thu, 30 Jul 2015 12:18:28 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id p4Arqh9o4aA2 for ; Thu, 30 Jul 2015 12:18:17 +0000 (UTC) Received: from mail-wi0-f178.google.com (mail-wi0-f178.google.com [209.85.212.178]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 5934421231 for ; Thu, 30 Jul 2015 12:18:16 +0000 (UTC) Received: by wibud3 with SMTP id ud3so65752910wib.0 for ; Thu, 30 Jul 2015 05:18:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=0OMuRX7btmtJ/hH6HYdgQlQxUK3d1+02gfEOcwXKI2g=; b=dxh2zGg+9MubXu41IvBZ37YMesIcazN1yNcigB4g+B4E0ZDDQ2spba3oamlCbE0zCQ W52fEOGprAAdpFvP9KwNicZ5Sg48JdJuIafbh4Zu69uXhkZ1ZZIeX2hx3V5tpwdDSCbu zh2lraBdWpnCg14CViiWC8HfzMRbrDx077+rc6OkmN7rXUBldNnxBM/C/GMbipL5V17w 9k25b355TUWsVLZunfXkOhCuYOYtytJDUrUQ4uCaw1ZNKIELYY2zrn9eC0x4jogrN4Bp v4u+pO4eiHzemejWoPDk0iD0vf/Esmi1wvPl7wDOss8D0oHdaOdzbLIsA5Gm/UQjKY+r z3Dg== MIME-Version: 1.0 X-Received: by 10.194.200.42 with SMTP id jp10mr93487664wjc.66.1438258695725; Thu, 30 Jul 2015 05:18:15 -0700 (PDT) Received: by 10.194.171.8 with HTTP; Thu, 30 Jul 2015 05:18:15 -0700 (PDT) In-Reply-To: References: Date: Thu, 30 Jul 2015 14:18:15 +0200 Message-ID: Subject: Re: Tuple model project From: Till Rohrmann To: user@flink.apache.org Content-Type: multipart/alternative; boundary=047d7bb04272a76f67051c16b2c2 --047d7bb04272a76f67051c16b2c2 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable You could try to use the TypeSerializerInputFormat. =E2=80=8B On Thu, Jul 30, 2015 at 2:08 PM, Flavio Pompermaier wrote: > How can I create a Flink dataset given a directory path that contains a > set of java objects serialized with kryo (one file per object)? > > On Thu, Jul 30, 2015 at 1:41 PM, Till Rohrmann > wrote: > >> Hi Flavio, >> >> in order to use the Kryo serializer for a given type you can use the >> registerTypeWithKryoSerializer of the ExecutionEnvironment object. What >> you provide to the method is the type you want to be serialized with kry= o >> and an implementation of the com.esotericsoftware.kryo.Serializer class. >> If the given type is not supported by Flink=E2=80=99s own serialization = framework, >> then this custom serializer should be used. You register the types at th= e >> beginning of your Flink program: >> >> def main(args: Array[String]): Unit =3D { >> val env =3D ExecutionEnvironment.getExecutionEnvironment >> >> env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSeri= alizer]) >> >> ... >> >> env.execute() >> >> } >> >> Cheers, >> Till >> =E2=80=8B >> >> On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompermaier < >> pompermaier@okkam.it> wrote: >> >>> I have a project that produce RDF quads and I have to store to read the= m >>> with Flink afterwards. >>> I could use thrift/protobuf/avro but this means to add a lot of >>> transitive dependencies to my project. >>> Maybe I could use Kryo to store those objects..is there any example to >>> create a dataset of objects serialized with kryo? >>> >>> On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen wrote= : >>> >>>> Quick response: I am not opposed to that, but there are tuple librarie= s >>>> around already. >>>> >>>> Do you need specifically the Flink tuples, for interoperability betwee= n >>>> Flink and other projects? >>>> >>>> On Thu, Jul 30, 2015 at 11:07 AM, Stephan Ewen >>>> wrote: >>>> >>>>> Should we move this to the dev list? >>>>> >>>>> On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier < >>>>> pompermaier@okkam.it> wrote: >>>>> >>>>>> Any thought about this (move tuples classes in a separate >>>>>> self-contained project with no transitive dependencies so that to be= easily >>>>>> used in other external projects)? >>>>>> >>>>>> On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier < >>>>>> pompermaier@okkam.it> wrote: >>>>>> >>>>>>> Do you think it could be a good idea to extract Flink tuples in a >>>>>>> separate project so that to allow simpler dependency management in >>>>>>> Flin-compatible projects? >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> at the moment, Tuples are more efficient than POJOs, because POJO >>>>>>>> fields are accessed via Java reflection whereas Tuple fields are d= irectly >>>>>>>> accessed. >>>>>>>> This performance penalty could be overcome by code-generated >>>>>>>> seriliazers and comparators but I am not aware of any work in that >>>>>>>> direction. >>>>>>>> >>>>>>>> Best, Fabian >>>>>>>> >>>>>>>> 2015-07-06 11:01 GMT+02:00 Flavio Pompermaier >>>>>>> >: >>>>>>>> >>>>>>>>> Hi to all, >>>>>>>>> I was thinking to write my own flink-compatible library and I nee= d >>>>>>>>> basically a Tuple5. >>>>>>>>> >>>>>>>>> Is there any performace loss in using a POJO with 5 String fields >>>>>>>>> vs a Tuple5? >>>>>>>>> If yes, wouldn't be a good idea to extract flink tuples in a >>>>>>>>> separate simple project (e.g. flink-java-tuples) that has no othe= r >>>>>>>>> dependency to enable other libs to write their flink-compatible l= ogic >>>>>>>>> without the need to exclude all the transitive dependency of flin= k-java? >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Flavio >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>> >> > > --047d7bb04272a76f67051c16b2c2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

You could try to use the TypeSerializerInputFormat.

=E2=80=8B

On Thu, Jul 30, 2015= at 2:08 PM, Flavio Pompermaier <pompermaier@okkam.it> wr= ote:
How can I create a = Flink dataset given a directory path that contains a set of java objects se= rialized with kryo (one file per object)?

On Thu, Jul 30, 2015 at 1:4= 1 PM, Till Rohrmann <trohrmann@apache.org> wrote:

Hi Flavio,

in order to use the Kryo serializer for a given type you can use the registerTypeWithKryoSerializer of the Executi= onEnvironment object. What you provide to the method is the type you= want to be serialized with kryo and an implementation of the com.esotericsoftware.kryo.Serializer class. If the given type = is not supported by Flink=E2=80=99s own serialization framework, then this = custom serializer should be used. You register the types at the beginning o= f your Flink program:

def main(args: Array=
[String]): Unit =3D {
  val env =3D ExecutionEnvironment.getExecutionEnvironment

  env.registerTypeWithKryoSerializer(classOf[MyType], classOf[MyTypeSeriali=
zer])

  ...

  env.execute()

}

Cheers,
Till

=E2=80=8B
On Thu, Jul 30, 2015 at 12:45 PM, Flavio Pompe= rmaier <pompermaier@okkam.it> wrote:
I have a project that produce RDF quads and = I have to store to read them with Flink afterwards.
I could use thrift/= protobuf/avro but this means to add a lot of transitive dependencies to my = project.
Maybe I could use Kryo to store th= ose objects..is there any example to create a dataset of objects serialized= with kryo?

On Thu, Jul 30, 2015 at 11:10 AM, Stephan Ewen = <sewen@apache.org<= /a>> wrote:
Qu= ick response: I am not opposed to that, but there are tuple libraries aroun= d already.

Do you need specifically the Flink tuples, fo= r interoperability between Flink and other projects?
<= div class=3D"gmail_extra">
On Thu, Jul 30, 20= 15 at 11:07 AM, Stephan Ewen <sewen@apache.org> wrote:
Should we move this to the dev= list?

On Thu, Jul 30, 2015 at 10:43 AM, Flavio Pompermaier <pompermaier@= okkam.it> wrote:
Any thought about this (move tuples classes in a separate self-con= tained project with no transitive dependencies so that to be easily used in= other external projects)?

On Mon, Jul 6, 2015 at 11:09 AM, Flavio Pompermaier <pompermaier@okkam.it> wrote:
Do you think it could be a good idea to extract Flink t= uples in a separate project so that to allow simpler dependency management = in Flin-compatible projects?

On Mon, Jul 6, 2015 at 11:06 AM, Fabian Hueske <fhues= ke@gmail.com> wrote:
Hi,

at the moment, Tuples are more e= fficient than POJOs, because POJO fields are accessed via Java reflection w= hereas Tuple fields are directly accessed.
This performance penalt= y could be overcome by code-generated seriliazers and comparators but I am = not aware of any work in that direction.

Best, Fabian

2015-0= 7-06 11:01 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
Hi to all,
I = was thinking to write my own flink-compatible library and I need basically = a Tuple5.

Is there any performace loss in using a = POJO with 5 String fields vs a Tuple5?
If yes, wouldn't be a = good idea to extract flink tuples in a separate simple project (e.g. flink-= java-tuples) that has no other dependency to enable other libs to write the= ir flink-compatible logic without the need to exclude all the transitive de= pendency of flink-java?

Best,
Flavio



<= /blockquote>

<= /div>



<= /p>




=


--047d7bb04272a76f67051c16b2c2--