spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: [VOTE] Spark 2.3.0 (RC4)
Date Wed, 21 Feb 2018 20:25:41 GMT
I'm -1 on any changes that aren't fixing major regressions from 2.2 at this
point. Also in any cases where its possible we should be flipping new
features off if they are still regressing, rather than continuing to
attempt to fix them.

Since its experimental, I would support backporting the DataSourceV2
patches into 2.3.1 so that there is more opportunity for feedback as the
API matures.

On Wed, Feb 21, 2018 at 11:32 AM, Shixiong(Ryan) Zhu <
shixiong@databricks.com> wrote:

> FYI. I found two more blockers:
>
> https://issues.apache.org/jira/browse/SPARK-23475
> https://issues.apache.org/jira/browse/SPARK-23481
>
> On Wed, Feb 21, 2018 at 9:45 AM, Xiao Li <gatorsmile@gmail.com> wrote:
>
>> Hi, Ryan,
>>
>> In this release, Data Source V2 is experimental. We are still collecting
>> the feedbacks from the community and will improve the related APIs and
>> implementation in the next 2.4 release.
>>
>> Thanks,
>>
>> Xiao
>>
>> 2018-02-21 9:43 GMT-08:00 Xiao Li <gatorsmile@gmail.com>:
>>
>>> Hi, Justin,
>>>
>>> Based on my understanding, SPARK-17147 is also not a regression. Thus,
>>> Spark 2.3.0 is unable to contain it. We have to wait for the committers who
>>> are familiar with Spark Streaming to make a decision whether we can fix the
>>> issue in Spark 2.3.1.
>>>
>>> Since this is open source, feel free to add the patch in your local
>>> build.
>>>
>>> Thanks for using Spark!
>>>
>>> Xiao
>>>
>>>
>>> 2018-02-21 9:36 GMT-08:00 Ryan Blue <rblue@netflix.com.invalid>:
>>>
>>>> No problem if we can't add them, this is experimental anyway so this
>>>> release should be more about validating the API and the start of our
>>>> implementation. I just don't think we can recommend that anyone actually
>>>> use DataSourceV2 without these patches.
>>>>
>>>> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan <cloud0fan@gmail.com>
>>>> wrote:
>>>>
>>>>> SPARK-23323 adds a new API, I'm not sure we can still do it at this
>>>>> stage of the release... Besides users can work around it by calling the
>>>>> spark output coordinator themselves in their data source.
>>>>>
>>>>> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard
>>>>> to convince other people that it's safe to add it to the release during
the
>>>>> RC phase.
>>>>>
>>>>> SPARK-23418 depends on the above one.
>>>>>
>>>>> Generally they are good to have in Spark 2.3, if they were merged
>>>>> before the RC. I think this is a lesson we should learn from, that we
>>>>> should work on stuff we want in the release before the RC, instead of
after.
>>>>>
>>>>> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue <rblue@netflix.com.invalid>
>>>>> wrote:
>>>>>
>>>>>> What does everyone think about getting some of the newer DataSourceV2
>>>>>> improvements in? It should be low risk because it is a new code path,
and
>>>>>> v2 isn't very usable without things like support for using the output
>>>>>> commit coordinator to deconflict writes.
>>>>>>
>>>>>> The ones I'd like to get in are:
>>>>>> * Use the output commit coordinator: https://issues.ap
>>>>>> ache.org/jira/browse/SPARK-23323
>>>>>> * Use immutable trees and the same push-down logic as other read
>>>>>> paths: https://issues.apache.org/jira/browse/SPARK-23203
>>>>>> * Don't allow users to supply schemas when they aren't supported:
>>>>>> https://issues.apache.org/jira/browse/SPARK-23418
>>>>>>
>>>>>> I think it would make the 2.3.0 release more usable for anyone
>>>>>> interested in the v2 read and write paths.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <
>>>>>> weichen.xu@databricks.com> wrote:
>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <
>>>>>>> vanzin@cloudera.com> wrote:
>>>>>>>
>>>>>>>> Done, thanks!
>>>>>>>>
>>>>>>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <
>>>>>>>> sameerag@apache.org> wrote:
>>>>>>>> > Sure, please feel free to backport.
>>>>>>>> >
>>>>>>>> > On 20 February 2018 at 18:02, Marcelo Vanzin <vanzin@cloudera.com>
>>>>>>>> wrote:
>>>>>>>> >>
>>>>>>>> >> Hey Sameer,
>>>>>>>> >>
>>>>>>>> >> Mind including https://github.com/apache/spark/pull/20643
>>>>>>>> >> (SPARK-23468)  in the new RC? It's a minor bug since
I've only
>>>>>>>> hit it
>>>>>>>> >> with older shuffle services, but it's pretty safe.
>>>>>>>> >>
>>>>>>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal
<
>>>>>>>> sameerag@apache.org>
>>>>>>>> >> wrote:
>>>>>>>> >> > This RC has failed due to
>>>>>>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>>>>>>> >> > Now that the fix has been merged in 2.3 (thanks
Marcelo!),
>>>>>>>> I'll follow
>>>>>>>> >> > up
>>>>>>>> >> > with an RC5 soon.
>>>>>>>> >> >
>>>>>>>> >> > On 20 February 2018 at 16:49, Ryan Blue <rblue@netflix.com>
>>>>>>>> wrote:
>>>>>>>> >> >>
>>>>>>>> >> >> +1
>>>>>>>> >> >>
>>>>>>>> >> >> Build & tests look fine, checked signature
and checksums for
>>>>>>>> src
>>>>>>>> >> >> tarball.
>>>>>>>> >> >>
>>>>>>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan)
Zhu
>>>>>>>> >> >> <shixiong@databricks.com> wrote:
>>>>>>>> >> >>>
>>>>>>>> >> >>> I'm -1 because of the UI regression
>>>>>>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470
: the All
>>>>>>>> Jobs page
>>>>>>>> >> >>> may be
>>>>>>>> >> >>> too slow and cause "read timeout" when
there are lots of
>>>>>>>> jobs and
>>>>>>>> >> >>> stages.
>>>>>>>> >> >>> This is one of the most important pages
because when it's
>>>>>>>> broken, it's
>>>>>>>> >> >>> pretty hard to use Spark Web UI.
>>>>>>>> >> >>>
>>>>>>>> >> >>>
>>>>>>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco
Gaido <
>>>>>>>> marcogaido91@gmail.com>
>>>>>>>> >> >>> wrote:
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> +1
>>>>>>>> >> >>>>
>>>>>>>> >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin
Kwon <
>>>>>>>> gurwls223@gmail.com>:
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> +1 too
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>> 2018-02-20 14:41 GMT+09:00
Takuya UESHIN <
>>>>>>>> ueshin@happy-camper.st>:
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> +1
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> On Tue, Feb 20, 2018 at
2:14 PM, Xingbo Jiang
>>>>>>>> >> >>>>>> <jiangxb1987@gmail.com>
>>>>>>>> >> >>>>>> wrote:
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>> +1
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>>
>>>>>>>> >> >>>>>>> Wenchen Fan <cloud0fan@gmail.com>于2018年2月20日
周二下午1:09写道:
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> +1
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>> On Tue, Feb 20,
2018 at 12:53 PM, Reynold Xin
>>>>>>>> >> >>>>>>>> <rxin@databricks.com>
>>>>>>>> >> >>>>>>>> wrote:
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>> +1
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>> On Feb 20,
2018, 5:51 PM +1300, Sameer Agarwal
>>>>>>>> >> >>>>>>>>> <sameer.agrw@gmail.com>,
wrote:
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> this file
shouldn't be included?
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>> I've now deleted
this file
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>>> From: Sameer
Agarwal <sameer.agrw@gmail.com>
>>>>>>>> >> >>>>>>>>>> Sent: Saturday,
February 17, 2018 1:43:39 PM
>>>>>>>> >> >>>>>>>>>> To: Sameer
Agarwal
>>>>>>>> >> >>>>>>>>>> Cc: dev
>>>>>>>> >> >>>>>>>>>> Subject:
Re: [VOTE] Spark 2.3.0 (RC4)
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> I'll start
with a +1 once again.
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> All blockers
reported against RC3 have been resolved
>>>>>>>> and the
>>>>>>>> >> >>>>>>>>>> builds
are healthy.
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> On 17 February
2018 at 13:41, Sameer Agarwal
>>>>>>>> >> >>>>>>>>>> <sameerag@apache.org>
>>>>>>>> >> >>>>>>>>>> wrote:
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> Please
vote on releasing the following candidate as
>>>>>>>> Apache
>>>>>>>> >> >>>>>>>>>>> Spark
>>>>>>>> >> >>>>>>>>>>> version
2.3.0. The vote is open until Thursday
>>>>>>>> February 22,
>>>>>>>> >> >>>>>>>>>>> 2018
at 8:00:00
>>>>>>>> >> >>>>>>>>>>> am
UTC and passes if a majority of at least 3 PMC +1
>>>>>>>> votes are
>>>>>>>> >> >>>>>>>>>>> cast.
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> [ ]
+1 Release this package as Apache Spark 2.3.0
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> [ ]
-1 Do not release this package because ...
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> To
learn more about Apache Spark, please see
>>>>>>>> >> >>>>>>>>>>> https://spark.apache.org/
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> The
tag to be voted on is v2.3.0-rc4:
>>>>>>>> >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>>>>>>> >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> List
of JIRA tickets resolved in this release can be
>>>>>>>> found
>>>>>>>> >> >>>>>>>>>>> here:
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> https://issues.apache.org/jira
>>>>>>>> /projects/SPARK/versions/12339551
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> The
release files, including signatures, digests,
>>>>>>>> etc. can be
>>>>>>>> >> >>>>>>>>>>> found
at:
>>>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>> dist/dev/spark/v2.3.0-rc4-bin/
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> Release
artifacts are signed with the following key:
>>>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> The
staging repository for this release can be found
>>>>>>>> at:
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> https://repository.apache.org/
>>>>>>>> content/repositories/orgapachespark-1265/
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> The
documentation corresponding to this release can
>>>>>>>> be found
>>>>>>>> >> >>>>>>>>>>> at:
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/
>>>>>>>> dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> FAQ
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> =======================================
>>>>>>>> >> >>>>>>>>>>> What
are the unresolved issues targeted for 2.3.0?
>>>>>>>> >> >>>>>>>>>>> =======================================
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> Please
see https://s.apache.org/oXKi. At the time
>>>>>>>> of writing,
>>>>>>>> >> >>>>>>>>>>> there
are currently no known release blockers.
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> =========================
>>>>>>>> >> >>>>>>>>>>> How
can I help test this release?
>>>>>>>> >> >>>>>>>>>>> =========================
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> If
you are a Spark user, you can help us test this
>>>>>>>> release by
>>>>>>>> >> >>>>>>>>>>> taking
an existing Spark workload and running on
>>>>>>>> this release
>>>>>>>> >> >>>>>>>>>>> candidate,
>>>>>>>> >> >>>>>>>>>>> then
reporting any regressions.
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> If
you're working in PySpark you can set up a
>>>>>>>> virtual env and
>>>>>>>> >> >>>>>>>>>>> install
the current RC and see if anything important
>>>>>>>> breaks,
>>>>>>>> >> >>>>>>>>>>> in
the
>>>>>>>> >> >>>>>>>>>>> Java/Scala
you can add the staging repository to
>>>>>>>> your projects
>>>>>>>> >> >>>>>>>>>>> resolvers
and
>>>>>>>> >> >>>>>>>>>>> test
with the RC (make sure to clean up the artifact
>>>>>>>> cache
>>>>>>>> >> >>>>>>>>>>> before/after
so
>>>>>>>> >> >>>>>>>>>>> you
don't end up building with a out of date RC going
>>>>>>>> >> >>>>>>>>>>> forward).
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> ===========================================
>>>>>>>> >> >>>>>>>>>>> What
should happen to JIRA tickets still targeting
>>>>>>>> 2.3.0?
>>>>>>>> >> >>>>>>>>>>> ===========================================
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> Committers
should look at those and triage. Extremely
>>>>>>>> >> >>>>>>>>>>> important
>>>>>>>> >> >>>>>>>>>>> bug
fixes, documentation, and API tweaks that impact
>>>>>>>> >> >>>>>>>>>>> compatibility
should be
>>>>>>>> >> >>>>>>>>>>> worked
on immediately. Everything else please
>>>>>>>> retarget to
>>>>>>>> >> >>>>>>>>>>> 2.3.1
or 2.4.0 as
>>>>>>>> >> >>>>>>>>>>> appropriate.
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> ===================
>>>>>>>> >> >>>>>>>>>>> Why
is my bug not fixed?
>>>>>>>> >> >>>>>>>>>>> ===================
>>>>>>>> >> >>>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>> In
order to make timely releases, we will typically
>>>>>>>> not hold
>>>>>>>> >> >>>>>>>>>>> the
>>>>>>>> >> >>>>>>>>>>> release
unless the bug in question is a regression
>>>>>>>> from 2.2.0.
>>>>>>>> >> >>>>>>>>>>> That
being
>>>>>>>> >> >>>>>>>>>>> said,
if there is something which is a regression
>>>>>>>> from 2.2.0
>>>>>>>> >> >>>>>>>>>>> and
has not
>>>>>>>> >> >>>>>>>>>>> been
correctly targeted please ping me or a
>>>>>>>> committer to help
>>>>>>>> >> >>>>>>>>>>> target
the
>>>>>>>> >> >>>>>>>>>>> issue
(you can see the open issues listed as
>>>>>>>> impacting Spark
>>>>>>>> >> >>>>>>>>>>> 2.3.0
at
>>>>>>>> >> >>>>>>>>>>> https://s.apache.org/WmoI).
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>>
>>>>>>>> >> >>>>>>>>>> --
>>>>>>>> >> >>>>>>>>>> Sameer
Agarwal
>>>>>>>> >> >>>>>>>>>> Computer
Science | UC Berkeley
>>>>>>>> >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>>
>>>>>>>> >> >>>>>>>>> --
>>>>>>>> >> >>>>>>>>> Sameer Agarwal
>>>>>>>> >> >>>>>>>>> Computer Science
| UC Berkeley
>>>>>>>> >> >>>>>>>>> http://cs.berkeley.edu/~sameerag
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> --
>>>>>>>> >> >>>>>> Takuya UESHIN
>>>>>>>> >> >>>>>> Tokyo, Japan
>>>>>>>> >> >>>>>>
>>>>>>>> >> >>>>>> http://twitter.com/ueshin
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>>
>>>>>>>> >> >>>>
>>>>>>>> >> >>>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >>
>>>>>>>> >> >> --
>>>>>>>> >> >> Ryan Blue
>>>>>>>> >> >> Software Engineer
>>>>>>>> >> >> Netflix
>>>>>>>> >> >
>>>>>>>> >> >
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> --
>>>>>>>> >> Marcelo
>>>>>>>> >
>>>>>>>> >
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Marcelo
>>>>>>>>
>>>>>>>> ------------------------------------------------------------
>>>>>>>> ---------
>>>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ryan Blue
>>>>>> Software Engineer
>>>>>> Netflix
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>

Mime
View raw message