Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4846F200C7E for ; Tue, 23 May 2017 15:46:06 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 46D1C160BC3; Tue, 23 May 2017 13:46:06 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3C88A160BB6 for ; Tue, 23 May 2017 15:46:05 +0200 (CEST) Received: (qmail 5556 invoked by uid 500); 23 May 2017 13:46:03 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 5545 invoked by uid 99); 23 May 2017 13:46:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 May 2017 13:46:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 31C31CEFB8 for ; Tue, 23 May 2017 13:46:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.39 X-Spam-Level: ** X-Spam-Status: No, score=2.39 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id BEWpd5qTbIQW for ; Tue, 23 May 2017 13:46:01 +0000 (UTC) Received: from mail-lf0-f42.google.com (mail-lf0-f42.google.com [209.85.215.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id B6B6E5F20C for ; Tue, 23 May 2017 13:46:00 +0000 (UTC) Received: by mail-lf0-f42.google.com with SMTP id m18so49165941lfj.0 for ; Tue, 23 May 2017 06:46:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to; bh=N/yeSKIJ3vrsc/YkRj0OvYP6v/1LhpKt5CMFsS2OaBw=; b=JNH0Aj469cA4o0O9JwJxW0Or6TUa3MQhAufOseQdWqBO4DflwFwbSuf5AdRJYuszR7 kYnscNLzM+noLNQDeTdTabMTj6IMUNGjxQwL2ss3WxScJJxFWw0uVywq/PzffRE/QEf7 lJPJSp0Ym9k0ox7NBKjpZXLXLQC6ZZI95e7rOJm5sKr4u4m7S4fdgvNsONsPMVeaD7/W g7ymPYemrAdy/VUv6xwpvesYmIZ4VkRuiAa9EkwKi7XFtfhWtaQzqdmiEcKQcyNdJ3vA u8JO3NgpNlmQLHNbpq5lSb2oxauJVrz7YOSYPQdW3heGKMS+t0xx0jFM+h2IwglbKvjX /FhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to; bh=N/yeSKIJ3vrsc/YkRj0OvYP6v/1LhpKt5CMFsS2OaBw=; b=SoFLMmG01Or8DwNdgG1WNBvtfXmRi0QEfZxvZ89HCfF8s6CbENY4UfoQ4iia0oXScO +uxmWQ4Me8KKXtnsbHaV8LuER3CmgDGSy7EJRLN3o6qkFB436nxBn7B2fDUXMrocJRfL Qf5Qg91EK7F6QwhawbxB/xCiaCmVJsqRLJ6MH0Gxm5FiPcphOLQlRfsFc0aAHBGwVkzI h/mrm6m6F/wvWGuy4q5/BRMBSgvAZ1TzRHueibQA00N0QGPcABiE8uDxgUzwygkbeO9e oILzfaoStJJDxZ9+u5ZNsST78FkuRMvfOOSKKGm7OEuo8NJXuX6PRFKSePToGc3MoBtz Nq8A== X-Gm-Message-State: AODbwcArgBalxTWWYXGxp9lp+QvnQsg6fwPQ7g91JhDT14swPpw3BZhW xgtLh4a4wXPJ7Q== X-Received: by 10.46.69.84 with SMTP id s81mr8024123lja.41.1495547159530; Tue, 23 May 2017 06:45:59 -0700 (PDT) Received: from [172.30.0.4] (91a196b50c127.greendata.pl. [91.196.50.127]) by smtp.googlemail.com with ESMTPSA id 31sm183208lfx.35.2017.05.23.06.45.56 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 23 May 2017 06:45:58 -0700 (PDT) Subject: Re: [PYTHON] PySpark typing hints To: "Mendelson, Assaf" , "dev@spark.apache.org" References: <5930be96-1bf9-1f9b-8b6d-0ff2d229ed59@gmail.com> <4E2D62B698F0814BB1E367C9F243AB800F5696@MX302CL04.corp.emc.com> <4E2D62B698F0814BB1E367C9F243AB800F570E@MX302CL04.corp.emc.com> From: Maciej Szymkiewicz Message-ID: Date: Tue, 23 May 2017 15:45:49 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <4E2D62B698F0814BB1E367C9F243AB800F570E@MX302CL04.corp.emc.com> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="BTi93sTexnp6VRXAclHdjEomINpLibHrn" archived-at: Tue, 23 May 2017 13:46:06 -0000 --BTi93sTexnp6VRXAclHdjEomINpLibHrn Content-Type: multipart/mixed; boundary="E8VhfOcQdRPlSahoqDRD7GJ9bkd4lJbkI"; protected-headers="v1" From: Maciej Szymkiewicz To: "Mendelson, Assaf" , "dev@spark.apache.org" Message-ID: Subject: Re: [PYTHON] PySpark typing hints References: <5930be96-1bf9-1f9b-8b6d-0ff2d229ed59@gmail.com> <4E2D62B698F0814BB1E367C9F243AB800F5696@MX302CL04.corp.emc.com> <4E2D62B698F0814BB1E367C9F243AB800F570E@MX302CL04.corp.emc.com> In-Reply-To: <4E2D62B698F0814BB1E367C9F243AB800F570E@MX302CL04.corp.emc.com> --E8VhfOcQdRPlSahoqDRD7GJ9bkd4lJbkI Content-Type: multipart/alternative; boundary="------------AF9F2C51F1E31989B896C7D9" This is a multi-part message in MIME format. --------------AF9F2C51F1E31989B896C7D9 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 05/23/2017 02:45 PM, Mendelson, Assaf wrote: > > You are correct, > > I actually did not look too deeply into it until now as I noticed you > mentioned it is compatible with python 3 only and I saw in the github > that mypy or pytype is required. > > =20 > > Because of that I made my suggestions with the thought of python 2. > > =20 > > Looking into it more deeply, I am wondering what is not supported? Are > you talking about limitation for testing? > Since type checkers (unlike annotations) are not standardized, this varies between projects and versions. For MyPy quite a lot changed since I started annotating Spark. Few months ago I wouldn't even bother looking at the list of issues, today (as mentioned in the other message) we could remove metaclasses, and pass both Python 2 and Python 3 checks. The other part is typing module itself, as well as function annotations (outside docstrings). But this is not a problem with stub files. > > =20 > > If I understand correctly then one can use this without any issues for > pycharm (and other IDEs supporting the type hinting) even when > developing for python 2. > This strictly depends on type checker. I didn't follow the development, but I got this impression that a lot changed for example between PyCharm 2016.3 and 2017.1. I think that the important point is that lack of support, doesn't break anything. > > In addition, the tests can test the existing pyspark, they just have > to be run with a compatible packaging (e.g. mypy). > > Meaning that porting for python 2 would provide a very small advantage > over the immediate advantages (IDE usage and testing for most cases). > > =20 > > Am I missing something? > > =20 > > Thanks, > > Assaf. > > =20 > > *From:*Maciej Szymkiewicz [mailto:mszymkiewicz@gmail.com] > *Sent:* Tuesday, May 23, 2017 3:27 PM > *To:* Mendelson, Assaf > *Subject:* Re: [PYTHON] PySpark typing hints > > =20 > > =20 > > =20 > > On 05/23/2017 01:12 PM, assaf.mendelson wrote: > > That said, If we make a decision on the way to handle it then I > believe it would be a good idea to start even with the bare > minimum and continue to add to it (and therefore make it so many > people can contribute). The code I added in github were basically > the things I needed. > > I already have almost full coverage of the API, excluding some exotic > part of the legacy streaming, so starting with bare minimum is not > really required. > > The advantage of the first is that it is part of the code which means > it is easier to make it updated. The main issue with this is that > supporting auto generated code (as is the case in most functions) can > be a little awkward and actually is a relate to a separate issue as it > means pycharm marks most of the functions as an error (i.e. > pyspark.sql.functions.XXX is marked as not there=E2=80=A6) > > > Comment based annotations are not suitable for complex signatures with > multliversion support. > > Also there is no support for overloading, therefore it is not possible > to capture relationship between arguments, and arguments and return typ= e. > --=20 Maciej Szymkiewicz --------------AF9F2C51F1E31989B896C7D9 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable



On 05/23/2017 02:45 PM, Mendelson, Assaf wrote:

You are correct,

I actually did not look too deeply into it until now as I noticed you mentioned it is compatible with python 3 only and I saw in the github that mypy or pytype is required.=

=C2=A0

Because of that I made my suggestions with the thought of python 2.

=C2=A0

Looking into it more deeply, I am wondering what is not supported? Are you talking about limitation for testing?


Since type checkers (unlike annotations) are not standardized, this varies between projects and versions. For MyPy quite a lot changed since I started annotating Spark.

Few months ago I wouldn't even bother looking at the list of issues, today (as mentioned in the other message) we could remove metaclasses, and pass both Python 2 and Python 3 checks.

The other part is typing module itself, as well as function annotations (outside docstrings). But this is not a problem with stub files.

=C2=A0

If I understand correctly then one can use this without any issues for pycharm (and other IDEs supporting the type hinting) even when developing for python 2.


This strictly depends on type checker. I didn't follow the development, but I got this impression that a lot changed for example between PyCharm 2016.3 and 2017.1. I think that the important point is that lack of support, doesn't break anything.
=

In addition, the tests can test the existing pyspark, they just have to be run with a compatible packaging (e.g. mypy).<= /o:p>

Meaning that porting for python 2 would provide a very small advantage over the immediate advantages (IDE usage and testing for most cases).

=C2=A0

Am I missing something?

=C2=A0

Thanks,

=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 Assaf.

=C2=A0

From: Maciej Szymkiewicz [mailto:mszymkiewicz@gmail.com]
Sent: Tuesday, May 23, 2017 3:27 PM
To: Mendelson, Assaf
Subject: Re: [PYTHON] PySpark typing hints

=C2=A0

=C2=A0

=C2=A0

On 05/23/2017 01:12 PM, assaf.mendelson wrote:

= That said, If we make a decision on the way to handle it then I believe it would be a good idea to start even with the bare minimum and continue to add to it (and therefore make it so many people can contribute). The code I added in github were basically the things I needed.

I already have almost full coverage of the= API, excluding some exotic part of the legacy streaming, so starting with bare minimum is not really required.

The advantage of the first is that it is part of the code which means it is easier to make it updated. The main issue with this is that supporting auto generated code (as is the case in most functions) can be a little awkward and actually is a relate to a separate issue as it means pycharm marks most of the functions as an error (i.e. pyspark.sql.functions.XXX is marked as not there=E2=80=A6)


Comment based annotations are not suitable for complex signatures with multliversion support.

Also there is no support for overloading, therefore it is not possible to capture relationship between arguments, and arguments and return type.


--=20
Maciej Szymkiewicz
--------------AF9F2C51F1E31989B896C7D9-- --E8VhfOcQdRPlSahoqDRD7GJ9bkd4lJbkI-- --BTi93sTexnp6VRXAclHdjEomINpLibHrn Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEDQN9ICpoQg5mpCvOuRTvIXyla+0FAlkkPQ0ACgkQuRTvIXyl a+21DA/+M89/I7GmMms8OIlMuOKYG/KlEUFllviDGrzLy8SZx6hoMAPy+ALcEVES aLqUORHuodP8mROQ9uVQg67hMoQSKtaUsakMUiq6g8JI1yeKsHKt5lUmYTvSAHtN 9ayHgQQRxoCh/WPKWj1TZsEv30Rsx6nBwPWf6BCRdHzi0ZHMbanNep0FvONDq+W8 QWwuAmHsAnehDxUAcP2d8G5a1vLjGHPnb1SoizjBBVakYsKUZgxLZqEQQrLMlicM 71592aCXkTLiJw4Ho19L+fS5lTw/2YIHP78ACxDJCglKaaLToAizJ18DGZvfBxsn 0UKo9sA9a83hN0QTUBTm0SvHqrqtbikEv6v50lkeeHZ2G5gzlWOc2xPgv2pVPD9f YRZq68G1dqjnP4XD1zEdvD16DtgDYjsGcTjnfefITZRXVFc5+9eFrUTHOn80Cc9j On9+8rxsrRl/6Mg8Q+dqE7siVs6hlfDTDsIZe8+oHDnVm1/C3xsJjdPvzgpWIkZ9 6HEo6rdGO7vvgEsvmZDNhZMyrBWhnbHJsekaVIf61SZEhCz4Vo76ddeBWDGNoKVS GpJ8VmMTxu7gMQiSR1OVi9QC2/4B7jWqx4qURWavp3oGLhgFhIRl6Vadz423dr5B gZVP6EURy5unw4/G6EQAV5o5CvwtaWqPVSN2K4vhTOPVbYB2D3E= =ZZsz -----END PGP SIGNATURE----- --BTi93sTexnp6VRXAclHdjEomINpLibHrn--