Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 020732009EE for ; Wed, 18 May 2016 19:59:20 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 00924160A00; Wed, 18 May 2016 17:59:20 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 21CFA1609B0 for ; Wed, 18 May 2016 19:59:18 +0200 (CEST) Received: (qmail 99008 invoked by uid 500); 18 May 2016 17:59:13 -0000 Mailing-List: contact dev-help@apex.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@apex.incubator.apache.org Delivered-To: mailing list dev@apex.incubator.apache.org Received: (qmail 98776 invoked by uid 99); 18 May 2016 17:59:12 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2016 17:59:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 89EF018050F for ; Wed, 18 May 2016 17:59:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.279 X-Spam-Level: * X-Spam-Status: No, score=1.279 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=datatorrent-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id RYpLN9lFqzJz for ; Wed, 18 May 2016 17:59:10 +0000 (UTC) Received: from mail-qk0-f180.google.com (mail-qk0-f180.google.com [209.85.220.180]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 219F75F246 for ; Wed, 18 May 2016 17:59:10 +0000 (UTC) Received: by mail-qk0-f180.google.com with SMTP id n63so31668631qkf.0 for ; Wed, 18 May 2016 10:59:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datatorrent-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=ufG4eMgjfRPg+YSPy7OHr7FL5++D/JCxIa+9lGmxuP0=; b=ShTTRaQQITWkPZ5062b90BaJdyMLs1c7d8KVbz6hUqvzglT0wmh4GQP/pFg31D5Hgt 7gfY5SQUq6k/5Y5IBVvv81C8Jr0r48mYjA512YJhLNiyf6E02ihwfClqKa5650uR0TAX m7nJ/BrJzEbA8XWkqejz3kSwBZO6hjPDJSJnZNFg5C+DNvan8RvozRUjZZK8Mk8XcqvR bQkzMqpc1eXeEX/YNI7F2bf02XC9mSIhPv+nnNiwYiM58r6OoasPuye3YrdWWjRtsW/4 rHx7tylW8coaM5RnyG05D6Yy4ezUEj5AGyLY8+UOQoidMkLGW5t5dPEnIjLQiouc1f01 JHZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ufG4eMgjfRPg+YSPy7OHr7FL5++D/JCxIa+9lGmxuP0=; b=FBdDLLqHYUsoHq66uCgxus3WeBiC4Fu77PGy1BGzVE9UacItCbvbqXzXgDJ9Syiveq jh6dcaDG9iiyjdDu/0ShGRcQMBZEc2Bzk3C8LT2drvqjuQI/Yv5n5cYIEWgFgwPQ85Lo WSYOzz2CQTLfAWvaBYLkTuiqmPtYooryDslpmZUWvcacn0mpCP/DZv1pnHF7mfv4hr+b a0z/fBy/hjJhyHOZeMfljmGNy+7Lrc8ji29FGrwQHwennGVn5CKdbtfFS2Fvs2M7EI8m 4Gc2WIumxy8PlSK+KeGxZWQ/wKaqNYyvbp1lIJfiYN7iPM/bYVpa2R2Uw21d8cTzUaDG u+cQ== X-Gm-Message-State: AOPr4FU5d25N/9Ix+iBSuV5ThW0f32fCp7oM8ur7U05ZRDqyg+0azYVDtG1UvVZY6yvagAqkcd7+pe4hH1pBwLR6 X-Received: by 10.55.23.29 with SMTP id i29mr9096014qkh.201.1463594349606; Wed, 18 May 2016 10:59:09 -0700 (PDT) MIME-Version: 1.0 References: <5d63056d-194f-a5b1-1485-fab7444db2a6@datatorrent.com> <-1984445410349459728@unknownmsgid> In-Reply-To: From: Sandesh Hegde Date: Wed, 18 May 2016 17:58:59 +0000 Message-ID: Subject: Re: Serialization in Apex To: dev@apex.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114751ae4dd1b50533219dd4 archived-at: Wed, 18 May 2016 17:59:20 -0000 --001a114751ae4dd1b50533219dd4 Content-Type: text/plain; charset=UTF-8 Users should explicitly set this attribute, as it affects the performance greatly. If Kryo fails to deserialize then instead of showing stack trace, we can show a message indicating them to set the Attribute. On Wed, May 18, 2016 at 10:52 AM Timothy Farkas < timothytiborfarkas@gmail.com> wrote: > It will help a new user get something up and running quickly, but it may > leave people scratching their heads as to why performance is so bad. If we > move in the direction of an automatic fallback I think we should also > devise a way to explicitly warn the users with something more than just an > obscure warning message. Perhaps there can be a rest endpoint in the app > master that a UI can tap into which keeps a log of all the tuning decisions > made by the platform, and a corresponding dtcli command? That way we can > tell newcomers to check that log if they are having any performance issues. > > Thanks, > Tim > > On Wed, May 18, 2016 at 10:36 AM, David Yan wrote: > > > I think having a fallback to Java serialization is a good thing. > > I can imagine a user having trouble with Kryo serialization of their > > operator and unable to figure out then give up totally without us even > > knowing. > > > > David > > > > On Tue, May 17, 2016 at 11:50 AM, Thomas Weise > > wrote: > > > > > IMO automatically picking a serialializer conflicts with predictable > > system > > > behavior. If the serialization does not work I would want to know that > > > instead of the system doing some trick and arrive at suboptimal or > faulty > > > behavior. > > > > > > That does not mean we cannot have optimizations though, as long as > there > > is > > > explicit user control. > > > > > > Thomas > > > > > > > > > On Tue, May 17, 2016 at 11:34 AM, Bhupesh Chawda < > > bhupesh@datatorrent.com> > > > wrote: > > > > > > > As Ram ans Sandesh pointed out, we do have @Bind and > @DefaultSerializer > > > > annotations. However, these are tightly coupled with the field in > > > question > > > > and do require modifying external code. Additionally it may also > break > > > > other systems, if we are binding it to a JavaSerializer and perhaps > > there > > > > are systems which have other means of serializing the field. > > > > > > > > My point was more to do with user having to worry about what > serializer > > > to > > > > use and how to serialize objects. > > > > For example, I liked the approach that Storm takes by falling back to > > > Java > > > > serialization automatically in case the target class does not have a > > > > default constructor. > > > > > > > > Of course, we can explore type based serialization. But this email > was > > > more > > > > about the usability aspect; to handle classes not having default > > > > constructors in general, not just POJO tuples. > > > > > > > > ~Bhupesh > > > > > > > > > > > > > > > > On Tue, May 17, 2016 at 9:53 AM, Pramod Immaneni < > > pramod@datatorrent.com > > > > > > > > wrote: > > > > > > > > > Can we do a test where we hard code a codec for a POJO and compare > > > > > performance against kryo. Thereafter we can dynamically compose a > > > > > codec via pojoutils and inject it. > > > > > > > > > > Thanks > > > > > > > > > > > On May 17, 2016, at 8:16 AM, Vlad Rozov > > > > > wrote: > > > > > > > > > > > > +1 for type based serialization. Tuples in most cases are flat > > > > > records/pojo and it should be possible programmatically construct a > > > codec > > > > > that will significantly outperform Kryo. It should also reduce > amount > > > of > > > > > data passed over the wire. I started to look in that direction as > > well > > > as > > > > > Kryo serialization is one of bottlenecks that limits Apex > throughput > > > when > > > > > operators are deployed into different containers including > NODE_LOCAL > > > > case. > > > > > > > > > > > > Thank you, > > > > > > Vlad > > > > > > > > > > > >> On 5/17/16 07:13, Sandesh Hegde wrote: > > > > > >> If it is possible to serialize, platform should do it > > automatically, > > > > it > > > > > >> reduces the tribal knowledge requirement to use the platform. > > > Couples > > > > of > > > > > >> month back, I also sent out the similar email. > > > > > >> > > > > > >> Type based serialization may improve the performance. > > > > > >> > > > > > >>> On Tue, May 17, 2016, 6:06 AM Munagala Ramanath < > > > ram@datatorrent.com > > > > > > > > > > wrote: > > > > > >>> > > > > > >>> Traditionally, we've recommended using > > > > > >>> "@DefaultSerializer(JavaSerializer.class)" or > > > > > >>> "@FieldSerializer.Bind(CustomSerializer.class)" as outlined at > > > > > >>> > > > > > >>> > > > > > > > > > > > > > > > http://docs.datatorrent.com/troubleshooting/#application-throwing-following-kryo-exception > > > > > >>> > > > > > >>> Can you describe why those approaches are not adequate ? > > > > > >>> > > > > > >>> Ram > > > > > >>> > > > > > >>> On Mon, May 16, 2016 at 11:46 PM, Bhupesh Chawda < > > > > > bhupesh@datatorrent.com> > > > > > >>> wrote: > > > > > >>> > > > > > >>>> Hi All, > > > > > >>>> > > > > > >>>> While working on the integration of Apex with Apache Samoa, I > am > > > > > coming > > > > > >>>> across some scenarios where I have to add default constructors > > in > > > > some > > > > > >>>> external classes to make them Kryo serializable. Although this > > > > should > > > > > be > > > > > >>>> okay, we would like to avoid modifying external classes as far > > as > > > > > >>> possible. > > > > > >>>> Some other streaming engines have taken different approaches > > > towards > > > > > >>>> serialization. > > > > > >>>> > > > > > >>>> I looked at Flink and Storm serialization mechanisms. > > > > > >>>> > > > > > >>>> Storm has a fall back mechanism on Java serialization. It does > > use > > > > > Kryo > > > > > >>> for > > > > > >>>> serialization due to performance. But, if the class is not > > > > > serializable > > > > > >>>> using Kryo, then it will try to serialize it using Java > > > > > serialization. If > > > > > >>>> even then it cannot serialize, then it throws an error. [1] > > > > > >>>> > > > > > >>>> Flink has its own serialization stack where it uses a > serializer > > > > > based on > > > > > >>>> the type information known about the data. [2] > > > > > >>>> > > > > > >>>> What does the community think about the current state of > > > > > serialization in > > > > > >>>> Apex. Is there a need to explore some approaches which could > > avoid > > > > > >>>> serialization issues such as the one described above? Are > there > > > any > > > > > other > > > > > >>>> approaches one could use? > > > > > >>>> > > > > > >>>> 1. > > > > > >>> > > > > > > > > > > > > > > > http://storm.apache.org/releases/current/Serialization.html#java-serialization > > > > > >>>> 2. > > > > > >>> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/Type+System,+Type+Extraction,+Serialization > > > > > >>>> > > > > > >>>> ~Bhupesh > > > > > > > > > > > > > > > > > > > > > --001a114751ae4dd1b50533219dd4--