spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [VOTE] Apache Spark 2.2.0 (RC4)
Date Wed, 14 Jun 2017 08:08:27 GMT
For a shorter reproducer ...


df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d"))
collect(gapply(df, "a", function(key, x) { x }, schema(df)))

And running the below multiple times (5~7):

collect(gapply(df, "a", function(key, x) { x }, schema(df)))

looks occasionally throwing an error.


I will leave here and probably explain more information if a JIRA is open.
This does not look a regression anyway.



2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls223@gmail.com>:

>
> Per https://github.com/apache/spark/tree/v2.1.1,
>
> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs.
>
> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual
> machine not a VM) so it took me a while to re-try this.
> I re-built this again and checked the R version is 3.3.3 at least. I hope
> this one could double checked.
>
> Here is the self-reproducer:
>
> irisDF <- suppressWarnings(createDataFrame (iris))
> schema <-  structType(structField("Sepal_Length", "double"),
> structField("Avg", "double"))
> df4 <- gapply(
>   cols = "Sepal_Length",
>   irisDF,
>   function(key, x) {
>     y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE)
>   },
>   schema)
> collect(df4)
>
>
>
> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheung_m@hotmail.com>:
>
>> Thanks! Will try to setup RHEL/CentOS to test it out
>>
>> _____________________________
>> From: Nick Pentreath <nick.pentreath@gmail.com>
>> Sent: Tuesday, June 13, 2017 11:38 PM
>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>> To: Felix Cheung <felixcheung_m@hotmail.com>, Hyukjin Kwon <
>> gurwls223@gmail.com>, dev <dev@spark.apache.org>
>>
>> Cc: Sean Owen <sowen@cloudera.com>
>>
>>
>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have to
>> report back later with the versions as am AFK.
>>
>> R version not totally sure but again will revert asap
>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheung_m@hotmail.com>
>> wrote:
>>
>>> Thanks
>>> This was with an external package and unrelated
>>>
>>>   >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>>
>>> As for CentOS - would it be possible to test against R older than 3.4.0?
>>> This is the same error reported by Nick below.
>>>
>>> _____________________________
>>> From: Hyukjin Kwon <gurwls223@gmail.com>
>>> Sent: Tuesday, June 13, 2017 8:02 PM
>>>
>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4)
>>> To: dev <dev@spark.apache.org>
>>> Cc: Sean Owen <sowen@cloudera.com>, Nick Pentreath <
>>> nick.pentreath@gmail.com>, Felix Cheung <felixcheung_m@hotmail.com>
>>>
>>>
>>>
>>> For the test failure on R, I checked:
>>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4,
>>>
>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed (
>>> https://ci.appveyor.com/project/spark-test/spark/build/755-
>>> r-test-v2.2.0-rc4)
>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed
>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning (
>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845)
>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/Hyukj
>>> inKwon/2a736b9f80318618cc147ac2bb1a987d)
>>>
>>>
>>> Per https://github.com/apache/spark/tree/v2.1.1,
>>>
>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced (https://gist.github.com/Hyukj
>>> inKwon/6064b0d10bab8fc1dc6212452d83b301)
>>>
>>>
>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my tests
>>> and observations.
>>>
>>> This is failed in Spark 2.1.1. So, it sounds not a regression although
>>> it is a bug that should be fixed (whether in Spark or R).
>>>
>>>
>>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsmile@gmail.com>:
>>>
>>>> -1
>>>>
>>>> Spark 2.2 is unable to read the partitioned table created by Spark 2.1
>>>> or earlier.
>>>>
>>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085
>>>>
>>>> Will fix it soon.
>>>>
>>>> Thanks,
>>>>
>>>> Xiao Li
>>>>
>>>>
>>>>
>>>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <joseph@databricks.com>:
>>>>
>>>>> Re: the QA JIRAs:
>>>>> Thanks for discussing them.  I still feel they are very helpful; I
>>>>> particularly notice not having to spend a solid 2-3 weeks of time QAing
>>>>> (unlike in earlier Spark releases).  One other point not mentioned above:
I
>>>>> think they serve as a very helpful reminder/training for the community
for
>>>>> rigor in development.  Since we instituted QA JIRAs, contributors have
been
>>>>> a lot better about adding in docs early, rather than waiting until the
end
>>>>> of the cycle (though I know this is drawing conclusions from correlations).
>>>>>
>>>>> I would vote in favor of the RC...but I'll wait to see about the
>>>>> reported failures.
>>>>>
>>>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <sowen@cloudera.com>
wrote:
>>>>>
>>>>>> Different errors as in https://issues.apache.org/j
>>>>>> ira/browse/SPARK-20520 but that's also reporting R test failures.
>>>>>>
>>>>>> I went back and tried to run the R tests and they passed, at least
on
>>>>>> Ubuntu 17 / R 3.3.
>>>>>>
>>>>>>
>>>>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath <
>>>>>> nick.pentreath@gmail.com> wrote:
>>>>>>
>>>>>>> All Scala, Python tests pass. ML QA and doc issues are resolved
(as
>>>>>>> well as R it seems).
>>>>>>>
>>>>>>> However, I'm seeing the following test failure on R consistently:
>>>>>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72
>>>>>>>
>>>>>>>
>>>>>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g.lee@gmail.com>
wrote:
>>>>>>>
>>>>>>>> +1 non-binding
>>>>>>>>
>>>>>>>> Tested on macOS Sierra, Ubuntu 16.04
>>>>>>>> test suite includes various test cases including Spark SQL,
ML,
>>>>>>>> GraphFrames, Structured Streaming
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.khan@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 non-binding
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> vaquar khan
>>>>>>>>>
>>>>>>>>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" <
>>>>>>>>> ricardo.almeida@actnowib.com> wrote:
>>>>>>>>>
>>>>>>>>> +1 (non-binding)
>>>>>>>>>
>>>>>>>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3
-Pyarn
>>>>>>>>> -Phive -Phive-thriftserver -Pscala-2.11 on
>>>>>>>>>
>>>>>>>>>    - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111)
>>>>>>>>>    - macOS 10.12.5 Java 8 (build 1.8.0_131)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 5 June 2017 at 21:14, Michael Armbrust <michael@databricks.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Please vote on releasing the following candidate
as Apache Spark
>>>>>>>>>> version 2.2.0. The vote is open until Thurs, June
8th, 2017 at
>>>>>>>>>> 12:00 PST and passes if a majority of at least 3
+1 PMC votes are
>>>>>>>>>> cast.
>>>>>>>>>>
>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>>>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> To learn more about Apache Spark, please see
>>>>>>>>>> http://spark.apache.org/
>>>>>>>>>>
>>>>>>>>>> The tag to be voted on is v2.2.0-rc4
>>>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc4>
(
>>>>>>>>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e)
>>>>>>>>>>
>>>>>>>>>> List of JIRA tickets resolved can be found with this
filter
>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0>
>>>>>>>>>> .
>>>>>>>>>>
>>>>>>>>>> The release files, including signatures, digests,
etc. can be
>>>>>>>>>> found at:
>>>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-
>>>>>>>>>> rc4-bin/
>>>>>>>>>>
>>>>>>>>>> Release artifacts are signed with the following key:
>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>>>>>>>
>>>>>>>>>> The staging repository for this release can be found
at:
>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache
>>>>>>>>>> spark-1241/
>>>>>>>>>>
>>>>>>>>>> The documentation corresponding to this release can
be found at:
>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.
>>>>>>>>>> 0-rc4-docs/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *FAQ*
>>>>>>>>>>
>>>>>>>>>> *How can I help test this release?*
>>>>>>>>>>
>>>>>>>>>> If you are a Spark user, you can help us test this
release by
>>>>>>>>>> taking an existing Spark workload and running on
this release candidate,
>>>>>>>>>> then reporting any regressions.
>>>>>>>>>>
>>>>>>>>>> *What should happen to JIRA tickets still targeting
2.2.0?*
>>>>>>>>>>
>>>>>>>>>> Committers should look at those and triage. Extremely
important
>>>>>>>>>> bug fixes, documentation, and API tweaks that impact
compatibility should
>>>>>>>>>> be worked on immediately. Everything else please
retarget to 2.3.0 or 2.2.1.
>>>>>>>>>>
>>>>>>>>>> *But my bug isn't fixed!??!*
>>>>>>>>>>
>>>>>>>>>> In order to make timely releases, we will typically
not hold the
>>>>>>>>>> release unless the bug in question is a regression
from 2.1.1.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Joseph Bradley
>>>>>
>>>>> Software Engineer - Machine Learning
>>>>>
>>>>> Databricks, Inc.
>>>>>
>>>>> [image: http://databricks.com] <http://databricks.com/>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>

Mime
View raw message