spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: [VOTE] Apache Spark 2.2.0 (RC4)
Date Wed, 21 Jun 2017 13:50:29 GMT
Thanks, I added the details of my environment to the JIRA (for what it's
worth now, as the issue is identified)

On Wed, 14 Jun 2017 at 11:28 Hyukjin Kwon <gurwls223@gmail.com> wrote:

> Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.
>
> 2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls223@gmail.com>:
>
>> For a shorter reproducer ...
>>
>>
>> df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> And running the below multiple times (5~7):
>>
>> collect(gapply(df, "a", function(key, x) { x }, schema(df)))
>>
>> looks occasionally throwing an error.
>>
>>
>> I will leave here and probably explain more information if a JIRA is
>> open. This does not look a regression anyway.
>>
>>
>>
>> 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls223@gmail.com>:
>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>
>>> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>>>
>>> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
>>> machine not a VM) so it took me a while to re-try this.
>>> I re-built this again and checked the R version is 3.3.3 at least. I
>>> hope this one could double checked.
>>>
>>> Here is the self-reproducer:
>>>
>>> irisDF <- suppressWarnings(createDataFrame (iris))
>>> schema <-  structType(structField("Sepal_Length", "double"),
>>> structField("Avg", "double"))
>>> df4 <- gapply(
>>>   cols = "Sepal_Length",
>>>   irisDF,
>>>   function(key, x) {
>>>     y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>>>   },
>>>   schema)
>>> collect(df4)
>>>
>>>
>>>
>>> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheung_m@hotmail.com>:
>>>
>>>> Thanks! Will try to setup RHEL/CentOS to test it out
>>>>
>>>> _____________________________
>>>> From: Nick Pentreath <nick.pentreath@gmail.com>
>>>> Sent: Tuesday, June 13, 2017 11:38 PM
>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>> To: Felix Cheung <felixcheung_m@hotmail.com>, Hyukjin Kwon <
>>>> gurwls223@gmail.com>, dev <dev@spark.apache.org>
>>>>
>>>> Cc: Sean Owen <sowen@cloudera.com>
>>>>
>>>>
>>>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have
>>>> to report back later with the versions as am AFK.
>>>>
>>>> R version not totally sure but again will revert asap
>>>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheung_m@hotmail.com>
>>>> wrote:
>>>>
>>>>> Thanks
>>>>> This was with an external package and unrelated
>>>>>
>>>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>>
>>>>> As for CentOS - would it be possible to test against R older than
>>>>> 3.4.0? This is the same error reported by Nick below.
>>>>>
>>>>> _____________________________
>>>>> From: Hyukjin Kwon <gurwls223@gmail.com>
>>>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>>>
>>>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>>>> To: dev <dev@spark.apache.org>
>>>>> Cc: Sean Owen <sowen@cloudera.com>, Nick Pentreath <
>>>>> nick.pentreath@gmail.com>, Felix Cheung <felixcheung_m@hotmail.com>
>>>>>
>>>>>
>>>>>
>>>>> For the test failure on R, I checked:
>>>>>
>>>>>
>>>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>>>
>>>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>>>> https://ci.appveyor.com/project/spark-test/spark/build/755-r-test-v2.2.0-rc4
>>>>> )
>>>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (
>>>>> https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d)
>>>>>
>>>>>
>>>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>>>
>>>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (
>>>>> https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301)
>>>>>
>>>>>
>>>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my
>>>>> tests and observations.
>>>>>
>>>>> This is failed in Spark 2.1.1. So, it sounds not a regression although
>>>>> it is a bug that should be fixed (whether in Spark or R).
>>>>>
>>>>>
>>>>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsmile@gmail.com>:
>>>>>
>>>>>> -1
>>>>>>
>>>>>> Spark 2.2 is unable to read the partitioned table created by Spark
>>>>>> 2.1 or earlier.
>>>>>>
>>>>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>>>>>>
>>>>>> Will fix it soon.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Xiao Li
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <joseph@databricks.com>:
>>>>>>
>>>>>>> Re: the QA JIRAs:
>>>>>>> Thanks for discussing them.  I still feel they are very helpful;
I
>>>>>>> particularly notice not having to spend a solid 2-3 weeks of
time QAing
>>>>>>> (unlike in earlier Spark releases).  One other point not mentioned
above: I
>>>>>>> think they serve as a very helpful reminder/training for the
community for
>>>>>>> rigor in development.  Since we instituted QA JIRAs, contributors
have been
>>>>>>> a lot better about adding in docs early, rather than waiting
until the end
>>>>>>> of the cycle (though I know this is drawing conclusions from
correlations).
>>>>>>>
>>>>>>> I would vote in favor of the RC...but I'll wait to see about
the
>>>>>>> reported failures.
>>>>>>>
>>>>>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <sowen@cloudera.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Different errors as in
>>>>>>>> https://issues.apache.org/jira/browse/SPARK-20520 but that's
also
>>>>>>>> reporting R test failures.
>>>>>>>>
>>>>>>>> I went back and tried to run the R tests and they passed,
at least
>>>>>>>> on Ubuntu 17 / R 3.3.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath <
>>>>>>>> nick.pentreath@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> All Scala, Python tests pass. ML QA and doc issues are
resolved
>>>>>>>>> (as well as R it seems).
>>>>>>>>>
>>>>>>>>> However, I'm seeing the following test failure on R consistently:
>>>>>>>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g.lee@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> +1 non-binding
>>>>>>>>>>
>>>>>>>>>> Tested on macOS Sierra, Ubuntu 16.04
>>>>>>>>>> test suite includes various test cases including
Spark SQL, ML,
>>>>>>>>>> GraphFrames, Structured Streaming
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.khan@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> +1 non-binding
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> vaquar khan
>>>>>>>>>>>
>>>>>>>>>>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
>>>>>>>>>>> ricardo.almeida@actnowib.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> +1 (non-binding)
>>>>>>>>>>>
>>>>>>>>>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3
>>>>>>>>>>> -Pyarn -Phive -Phive-thriftserver -Pscala-2.11
on
>>>>>>>>>>>
>>>>>>>>>>>    - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>>>>>>>>>>    - macOS 10.12.5 Java 8 (build 1.8.0_131)
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 5 June 2017 at 21:14, Michael Armbrust <
>>>>>>>>>>> michael@databricks.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Please vote on releasing the following candidate
as Apache
>>>>>>>>>>>> Spark version 2.2.0. The vote is open until
Thurs, June 8th,
>>>>>>>>>>>> 2017 at 12:00 PST and passes if a majority
of at least 3 +1 PMC
>>>>>>>>>>>> votes are cast.
>>>>>>>>>>>>
>>>>>>>>>>>> [ ] +1 Release this package as Apache Spark
2.2.0
>>>>>>>>>>>> [ ] -1 Do not release this package because
...
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> To learn more about Apache Spark, please
see
>>>>>>>>>>>> http://spark.apache.org/
>>>>>>>>>>>>
>>>>>>>>>>>> The tag to be voted on is v2.2.0-rc4
>>>>>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc4>
(
>>>>>>>>>>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>>>>>>>>>>>
>>>>>>>>>>>> List of JIRA tickets resolved can be found
with this filter
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>
>>>>>>>>>>>> .
>>>>>>>>>>>>
>>>>>>>>>>>> The release files, including signatures,
digests, etc. can be
>>>>>>>>>>>> found at:
>>>>>>>>>>>>
>>>>>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-bin/
>>>>>>>>>>>>
>>>>>>>>>>>> Release artifacts are signed with the following
key:
>>>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>>>>>>>>
>>>>>>>>>>>> The staging repository for this release can
be found at:
>>>>>>>>>>>>
>>>>>>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1241/
>>>>>>>>>>>>
>>>>>>>>>>>> The documentation corresponding to this release
can be found at:
>>>>>>>>>>>>
>>>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc4-docs/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> *FAQ*
>>>>>>>>>>>>
>>>>>>>>>>>> *How can I help test this release?*
>>>>>>>>>>>>
>>>>>>>>>>>> If you are a Spark user, you can help us
test this release by
>>>>>>>>>>>> taking an existing Spark workload and running
on this release candidate,
>>>>>>>>>>>> then reporting any regressions.
>>>>>>>>>>>>
>>>>>>>>>>>> *What should happen to JIRA tickets still
targeting 2.2.0?*
>>>>>>>>>>>>
>>>>>>>>>>>> Committers should look at those and triage.
Extremely important
>>>>>>>>>>>> bug fixes, documentation, and API tweaks
that impact compatibility should
>>>>>>>>>>>> be worked on immediately. Everything else
please retarget to 2.3.0 or 2.2.1.
>>>>>>>>>>>>
>>>>>>>>>>>> *But my bug isn't fixed!??!*
>>>>>>>>>>>>
>>>>>>>>>>>> In order to make timely releases, we will
typically not hold
>>>>>>>>>>>> the release unless the bug in question is
a regression from 2.1.1.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Joseph Bradley
>>>>>>>
>>>>>>> Software Engineer - Machine Learning
>>>>>>>
>>>>>>> Databricks, Inc.
>>>>>>>
>>>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message