spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <gatorsm...@gmail.com>
Subject Re: [VOTE] Spark 2.3.0 (RC4)
Date Wed, 21 Feb 2018 17:45:40 GMT
Hi, Ryan,

In this release, Data Source V2 is experimental. We are still collecting
the feedbacks from the community and will improve the related APIs and
implementation in the next 2.4 release.

Thanks,

Xiao

2018-02-21 9:43 GMT-08:00 Xiao Li <gatorsmile@gmail.com>:

> Hi, Justin,
>
> Based on my understanding, SPARK-17147 is also not a regression. Thus,
> Spark 2.3.0 is unable to contain it. We have to wait for the committers who
> are familiar with Spark Streaming to make a decision whether we can fix the
> issue in Spark 2.3.1.
>
> Since this is open source, feel free to add the patch in your local build.
>
> Thanks for using Spark!
>
> Xiao
>
>
> 2018-02-21 9:36 GMT-08:00 Ryan Blue <rblue@netflix.com.invalid>:
>
>> No problem if we can't add them, this is experimental anyway so this
>> release should be more about validating the API and the start of our
>> implementation. I just don't think we can recommend that anyone actually
>> use DataSourceV2 without these patches.
>>
>> On Wed, Feb 21, 2018 at 9:21 AM, Wenchen Fan <cloud0fan@gmail.com> wrote:
>>
>>> SPARK-23323 adds a new API, I'm not sure we can still do it at this
>>> stage of the release... Besides users can work around it by calling the
>>> spark output coordinator themselves in their data source.
>>>
>>> SPARK-23203 is non-trivial and didn't fix any known bugs, so it's hard
>>> to convince other people that it's safe to add it to the release during the
>>> RC phase.
>>>
>>> SPARK-23418 depends on the above one.
>>>
>>> Generally they are good to have in Spark 2.3, if they were merged before
>>> the RC. I think this is a lesson we should learn from, that we should work
>>> on stuff we want in the release before the RC, instead of after.
>>>
>>> On Thu, Feb 22, 2018 at 1:01 AM, Ryan Blue <rblue@netflix.com.invalid>
>>> wrote:
>>>
>>>> What does everyone think about getting some of the newer DataSourceV2
>>>> improvements in? It should be low risk because it is a new code path, and
>>>> v2 isn't very usable without things like support for using the output
>>>> commit coordinator to deconflict writes.
>>>>
>>>> The ones I'd like to get in are:
>>>> * Use the output commit coordinator: https://issues.ap
>>>> ache.org/jira/browse/SPARK-23323
>>>> * Use immutable trees and the same push-down logic as other read paths:
>>>> https://issues.apache.org/jira/browse/SPARK-23203
>>>> * Don't allow users to supply schemas when they aren't supported:
>>>> https://issues.apache.org/jira/browse/SPARK-23418
>>>>
>>>> I think it would make the 2.3.0 release more usable for anyone
>>>> interested in the v2 read and write paths.
>>>>
>>>> Thanks!
>>>>
>>>> On Tue, Feb 20, 2018 at 7:07 PM, Weichen Xu <weichen.xu@databricks.com>
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> On Wed, Feb 21, 2018 at 10:07 AM, Marcelo Vanzin <vanzin@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> Done, thanks!
>>>>>>
>>>>>> On Tue, Feb 20, 2018 at 6:05 PM, Sameer Agarwal <sameerag@apache.org>
>>>>>> wrote:
>>>>>> > Sure, please feel free to backport.
>>>>>> >
>>>>>> > On 20 February 2018 at 18:02, Marcelo Vanzin <vanzin@cloudera.com>
>>>>>> wrote:
>>>>>> >>
>>>>>> >> Hey Sameer,
>>>>>> >>
>>>>>> >> Mind including https://github.com/apache/spark/pull/20643
>>>>>> >> (SPARK-23468)  in the new RC? It's a minor bug since I've
only hit
>>>>>> it
>>>>>> >> with older shuffle services, but it's pretty safe.
>>>>>> >>
>>>>>> >> On Tue, Feb 20, 2018 at 5:58 PM, Sameer Agarwal <
>>>>>> sameerag@apache.org>
>>>>>> >> wrote:
>>>>>> >> > This RC has failed due to
>>>>>> >> > https://issues.apache.org/jira/browse/SPARK-23470.
>>>>>> >> > Now that the fix has been merged in 2.3 (thanks Marcelo!),
I'll
>>>>>> follow
>>>>>> >> > up
>>>>>> >> > with an RC5 soon.
>>>>>> >> >
>>>>>> >> > On 20 February 2018 at 16:49, Ryan Blue <rblue@netflix.com>
>>>>>> wrote:
>>>>>> >> >>
>>>>>> >> >> +1
>>>>>> >> >>
>>>>>> >> >> Build & tests look fine, checked signature
and checksums for src
>>>>>> >> >> tarball.
>>>>>> >> >>
>>>>>> >> >> On Tue, Feb 20, 2018 at 12:54 PM, Shixiong(Ryan)
Zhu
>>>>>> >> >> <shixiong@databricks.com> wrote:
>>>>>> >> >>>
>>>>>> >> >>> I'm -1 because of the UI regression
>>>>>> >> >>> https://issues.apache.org/jira/browse/SPARK-23470
: the All
>>>>>> Jobs page
>>>>>> >> >>> may be
>>>>>> >> >>> too slow and cause "read timeout" when there
are lots of jobs
>>>>>> and
>>>>>> >> >>> stages.
>>>>>> >> >>> This is one of the most important pages because
when it's
>>>>>> broken, it's
>>>>>> >> >>> pretty hard to use Spark Web UI.
>>>>>> >> >>>
>>>>>> >> >>>
>>>>>> >> >>> On Tue, Feb 20, 2018 at 4:37 AM, Marco Gaido
<
>>>>>> marcogaido91@gmail.com>
>>>>>> >> >>> wrote:
>>>>>> >> >>>>
>>>>>> >> >>>> +1
>>>>>> >> >>>>
>>>>>> >> >>>> 2018-02-20 12:30 GMT+01:00 Hyukjin Kwon
<gurwls223@gmail.com
>>>>>> >:
>>>>>> >> >>>>>
>>>>>> >> >>>>> +1 too
>>>>>> >> >>>>>
>>>>>> >> >>>>> 2018-02-20 14:41 GMT+09:00 Takuya UESHIN
<
>>>>>> ueshin@happy-camper.st>:
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> +1
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> On Tue, Feb 20, 2018 at 2:14 PM,
Xingbo Jiang
>>>>>> >> >>>>>> <jiangxb1987@gmail.com>
>>>>>> >> >>>>>> wrote:
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> +1
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>>
>>>>>> >> >>>>>>> Wenchen Fan <cloud0fan@gmail.com>于2018年2月20日
周二下午1:09写道:
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> +1
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>> On Tue, Feb 20, 2018 at
12:53 PM, Reynold Xin
>>>>>> >> >>>>>>>> <rxin@databricks.com>
>>>>>> >> >>>>>>>> wrote:
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>> +1
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>> On Feb 20, 2018, 5:51
PM +1300, Sameer Agarwal
>>>>>> >> >>>>>>>>> <sameer.agrw@gmail.com>,
wrote:
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> this file shouldn't
be included?
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> https://dist.apache.org/repos/
>>>>>> dist/dev/spark/v2.3.0-rc4-bin/spark-parent_2.11.iml
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>> I've now deleted this
file
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>>> From: Sameer Agarwal
<sameer.agrw@gmail.com>
>>>>>> >> >>>>>>>>>> Sent: Saturday,
February 17, 2018 1:43:39 PM
>>>>>> >> >>>>>>>>>> To: Sameer Agarwal
>>>>>> >> >>>>>>>>>> Cc: dev
>>>>>> >> >>>>>>>>>> Subject: Re: [VOTE]
Spark 2.3.0 (RC4)
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> I'll start with
a +1 once again.
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> All blockers reported
against RC3 have been resolved
>>>>>> and the
>>>>>> >> >>>>>>>>>> builds are healthy.
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> On 17 February
2018 at 13:41, Sameer Agarwal
>>>>>> >> >>>>>>>>>> <sameerag@apache.org>
>>>>>> >> >>>>>>>>>> wrote:
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> Please vote
on releasing the following candidate as
>>>>>> Apache
>>>>>> >> >>>>>>>>>>> Spark
>>>>>> >> >>>>>>>>>>> version 2.3.0.
The vote is open until Thursday
>>>>>> February 22,
>>>>>> >> >>>>>>>>>>> 2018 at 8:00:00
>>>>>> >> >>>>>>>>>>> am UTC and
passes if a majority of at least 3 PMC +1
>>>>>> votes are
>>>>>> >> >>>>>>>>>>> cast.
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> [ ] +1 Release
this package as Apache Spark 2.3.0
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> [ ] -1 Do not
release this package because ...
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> To learn more
about Apache Spark, please see
>>>>>> >> >>>>>>>>>>> https://spark.apache.org/
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> The tag to
be voted on is v2.3.0-rc4:
>>>>>> >> >>>>>>>>>>> https://github.com/apache/spark/tree/v2.3.0-rc4
>>>>>> >> >>>>>>>>>>> (44095cb65500739695b0324c177c19dfa1471472)
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> List of JIRA
tickets resolved in this release can be
>>>>>> found
>>>>>> >> >>>>>>>>>>> here:
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> https://issues.apache.org/jira
>>>>>> /projects/SPARK/versions/12339551
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> The release
files, including signatures, digests, etc.
>>>>>> can be
>>>>>> >> >>>>>>>>>>> found at:
>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/
>>>>>> dist/dev/spark/v2.3.0-rc4-bin/
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> Release artifacts
are signed with the following key:
>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> The staging
repository for this release can be found
>>>>>> at:
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> https://repository.apache.org/
>>>>>> content/repositories/orgapachespark-1265/
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> The documentation
corresponding to this release can be
>>>>>> found
>>>>>> >> >>>>>>>>>>> at:
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> https://dist.apache.org/repos/
>>>>>> dist/dev/spark/v2.3.0-rc4-docs/_site/index.html
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> FAQ
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> =======================================
>>>>>> >> >>>>>>>>>>> What are the
unresolved issues targeted for 2.3.0?
>>>>>> >> >>>>>>>>>>> =======================================
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> Please see
https://s.apache.org/oXKi. At the time of
>>>>>> writing,
>>>>>> >> >>>>>>>>>>> there are currently
no known release blockers.
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> =========================
>>>>>> >> >>>>>>>>>>> How can I help
test this release?
>>>>>> >> >>>>>>>>>>> =========================
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> If you are
a Spark user, you can help us test this
>>>>>> release by
>>>>>> >> >>>>>>>>>>> taking an existing
Spark workload and running on this
>>>>>> release
>>>>>> >> >>>>>>>>>>> candidate,
>>>>>> >> >>>>>>>>>>> then reporting
any regressions.
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> If you're working
in PySpark you can set up a virtual
>>>>>> env and
>>>>>> >> >>>>>>>>>>> install the
current RC and see if anything important
>>>>>> breaks,
>>>>>> >> >>>>>>>>>>> in the
>>>>>> >> >>>>>>>>>>> Java/Scala
you can add the staging repository to your
>>>>>> projects
>>>>>> >> >>>>>>>>>>> resolvers and
>>>>>> >> >>>>>>>>>>> test with the
RC (make sure to clean up the artifact
>>>>>> cache
>>>>>> >> >>>>>>>>>>> before/after
so
>>>>>> >> >>>>>>>>>>> you don't end
up building with a out of date RC going
>>>>>> >> >>>>>>>>>>> forward).
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> ===========================================
>>>>>> >> >>>>>>>>>>> What should
happen to JIRA tickets still targeting
>>>>>> 2.3.0?
>>>>>> >> >>>>>>>>>>> ===========================================
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> Committers
should look at those and triage. Extremely
>>>>>> >> >>>>>>>>>>> important
>>>>>> >> >>>>>>>>>>> bug fixes,
documentation, and API tweaks that impact
>>>>>> >> >>>>>>>>>>> compatibility
should be
>>>>>> >> >>>>>>>>>>> worked on immediately.
Everything else please retarget
>>>>>> to
>>>>>> >> >>>>>>>>>>> 2.3.1 or 2.4.0
as
>>>>>> >> >>>>>>>>>>> appropriate.
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> ===================
>>>>>> >> >>>>>>>>>>> Why is my bug
not fixed?
>>>>>> >> >>>>>>>>>>> ===================
>>>>>> >> >>>>>>>>>>>
>>>>>> >> >>>>>>>>>>> In order to
make timely releases, we will typically
>>>>>> not hold
>>>>>> >> >>>>>>>>>>> the
>>>>>> >> >>>>>>>>>>> release unless
the bug in question is a regression
>>>>>> from 2.2.0.
>>>>>> >> >>>>>>>>>>> That being
>>>>>> >> >>>>>>>>>>> said, if there
is something which is a regression from
>>>>>> 2.2.0
>>>>>> >> >>>>>>>>>>> and has not
>>>>>> >> >>>>>>>>>>> been correctly
targeted please ping me or a committer
>>>>>> to help
>>>>>> >> >>>>>>>>>>> target the
>>>>>> >> >>>>>>>>>>> issue (you
can see the open issues listed as impacting
>>>>>> Spark
>>>>>> >> >>>>>>>>>>> 2.3.0 at
>>>>>> >> >>>>>>>>>>> https://s.apache.org/WmoI).
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>>
>>>>>> >> >>>>>>>>>> --
>>>>>> >> >>>>>>>>>> Sameer Agarwal
>>>>>> >> >>>>>>>>>> Computer Science
| UC Berkeley
>>>>>> >> >>>>>>>>>> http://cs.berkeley.edu/~sameerag
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>>
>>>>>> >> >>>>>>>>> --
>>>>>> >> >>>>>>>>> Sameer Agarwal
>>>>>> >> >>>>>>>>> Computer Science |
UC Berkeley
>>>>>> >> >>>>>>>>> http://cs.berkeley.edu/~sameerag
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> --
>>>>>> >> >>>>>> Takuya UESHIN
>>>>>> >> >>>>>> Tokyo, Japan
>>>>>> >> >>>>>>
>>>>>> >> >>>>>> http://twitter.com/ueshin
>>>>>> >> >>>>>
>>>>>> >> >>>>>
>>>>>> >> >>>>
>>>>>> >> >>>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >>
>>>>>> >> >> --
>>>>>> >> >> Ryan Blue
>>>>>> >> >> Software Engineer
>>>>>> >> >> Netflix
>>>>>> >> >
>>>>>> >> >
>>>>>> >>
>>>>>> >>
>>>>>> >>
>>>>>> >> --
>>>>>> >> Marcelo
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Software Engineer
>>>> Netflix
>>>>
>>>
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
>

Mime
View raw message