spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: [VOTE] Release Apache Spark 2.0.0 (RC1)
Date Tue, 28 Jun 2016 09:28:52 GMT
I take it there will be another RC due to some blockers and as there were
no +1 votes anyway.

FWIW, I cannot run python tests using "./python/run-tests".

I'd be -1 for this reason (see https://github.com/apache/spark/pull/13737 /
http://issues.apache.org/jira/browse/SPARK-15954) - does anyone else
encounter this?

./python/run-tests --python-executables=python2.7
Running PySpark tests. Output is in
/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/unit-tests.log
Will test against the following Python executables: ['python2.7']
Will test the following Python modules: ['pyspark-core', 'pyspark-ml',
'pyspark-mllib', 'pyspark-sql', 'pyspark-streaming']
....Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
======================================================================
ERROR: setUpClass (pyspark.sql.tests.HiveContextSQLTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/sql/tests.py",
line 1620, in setUpClass
    cls.spark = HiveContext._createForTesting(cls.sc)
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/sql/context.py",
line 490, in _createForTesting
    jtestHive =
sparkContext._jvm.org.apache.spark.sql.hive.test.TestHiveContext(jsc)
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py",
line 1183, in __call__
    answer, self._gateway_client, None, self._fqn)
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py",
line 312, in get_return_value
    format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling
None.org.apache.spark.sql.hive.test.TestHiveContext.
: java.lang.NullPointerException
at
org.apache.spark.sql.hive.test.TestHiveSparkSession.getHiveFile(TestHive.scala:183)
at
org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:214)
at
org.apache.spark.sql.hive.test.TestHiveSparkSession.<init>(TestHive.scala:122)
at org.apache.spark.sql.hive.test.TestHiveContext.<init>(TestHive.scala:77)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:240)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:236)
at
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:211)
at java.lang.Thread.run(Thread.java:745)


======================================================================
ERROR: setUpClass (pyspark.sql.tests.SQLTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/sql/tests.py",
line 189, in setUpClass
    ReusedPySparkTestCase.setUpClass()
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/tests.py",
line 344, in setUpClass
    cls.sc = SparkContext('local[4]', cls.__name__)
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/context.py",
line 112, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway)
  File
"/Users/nick/workspace/scala/spark-rcs/spark-2.0.0/python/pyspark/context.py",
line 261, in _ensure_initialized
    callsite.function, callsite.file, callsite.linenum))
ValueError: Cannot run multiple SparkContexts at once; existing
SparkContext(app=ReusedPySparkTestCase, master=local[4]) created by
<module> at /Users/nick/miniconda2/lib/python2.7/runpy.py:72

----------------------------------------------------------------------
Ran 4 tests in 4.800s

FAILED (errors=2)

Had test failures in pyspark.sql.tests with python2.7; see logs.


On Mon, 27 Jun 2016 at 20:13 Egor Pahomov <pahomov.egor@gmail.com> wrote:

> -1 : SPARK-16228 [SQL]  - "Percentile" needs explicit cast to double,
> otherwise it throws an error. I can not move my existing 100500 quires to
> 2.0 transparently.
>
> 2016-06-24 11:52 GMT-07:00 Matt Cheah <mcheah@palantir.com>:
>
>> -1 because of SPARK-16181 which is a correctness regression from 1.6.
>> Looks like the patch is ready though:
>> https://github.com/apache/spark/pull/13884 – it would be ideal for this
>> patch to make it into the release.
>>
>> -Matt Cheah
>>
>> From: Nick Pentreath <nick.pentreath@gmail.com>
>> Date: Friday, June 24, 2016 at 4:37 AM
>> To: "dev@spark.apache.org" <dev@spark.apache.org>
>> Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1)
>>
>> I'm getting the following when trying to run ./dev/run-tests (not
>> happening on master) from the extracted source tar. Anyone else seeing
>> this?
>>
>> error: Could not access 'fc0a1475ef'
>> **********************************************************************
>> File "./dev/run-tests.py", line 69, in
>> __main__.identify_changed_files_from_git_commits
>> Failed example:
>>     [x.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__x.name&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=wx5Qjw-efxMVvKXnjUsSkkQcEF6zQHQLQaGtAK9pxIw&e=>
>> for x in determine_modules_for_files(
>> identify_changed_files_from_git_commits("fc0a1475ef",
>> target_ref="5da21f07"))]
>> Exception raised:
>>     Traceback (most recent call last):
>>       File "/Users/nick/miniconda2/lib/python2.7/doctest.py", line 1315,
>> in __run
>>         compileflags, 1) in test.globs
>>       File "<doctest
>> __main__.identify_changed_files_from_git_commits[0]>", line 1, in <module>
>>         [x.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__x.name&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=wx5Qjw-efxMVvKXnjUsSkkQcEF6zQHQLQaGtAK9pxIw&e=>
>> for x in determine_modules_for_files(
>> identify_changed_files_from_git_commits("fc0a1475ef",
>> target_ref="5da21f07"))]
>>       File "./dev/run-tests.py", line 86, in
>> identify_changed_files_from_git_commits
>>         universal_newlines=True)
>>       File "/Users/nick/miniconda2/lib/python2.7/subprocess.py", line
>> 573, in check_output
>>         raise CalledProcessError(retcode, cmd, output=output)
>>     CalledProcessError: Command '['git', 'diff', '--name-only',
>> 'fc0a1475ef', '5da21f07']' returned non-zero exit status 1
>> error: Could not access '50a0496a43'
>> **********************************************************************
>> File "./dev/run-tests.py", line 71, in
>> __main__.identify_changed_files_from_git_commits
>> Failed example:
>>     'root' in [x.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__x.name&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=wx5Qjw-efxMVvKXnjUsSkkQcEF6zQHQLQaGtAK9pxIw&e=>
>> for x in determine_modules_for_files(
>>  identify_changed_files_from_git_commits("50a0496a43",
>> target_ref="6765ef9"))]
>> Exception raised:
>>     Traceback (most recent call last):
>>       File "/Users/nick/miniconda2/lib/python2.7/doctest.py", line 1315,
>> in __run
>>         compileflags, 1) in test.globs
>>       File "<doctest
>> __main__.identify_changed_files_from_git_commits[1]>", line 1, in <module>
>>         'root' in [x.name
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__x.name&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=wx5Qjw-efxMVvKXnjUsSkkQcEF6zQHQLQaGtAK9pxIw&e=>
>> for x in determine_modules_for_files(
>>  identify_changed_files_from_git_commits("50a0496a43",
>> target_ref="6765ef9"))]
>>       File "./dev/run-tests.py", line 86, in
>> identify_changed_files_from_git_commits
>>         universal_newlines=True)
>>       File "/Users/nick/miniconda2/lib/python2.7/subprocess.py", line
>> 573, in check_output
>>         raise CalledProcessError(retcode, cmd, output=output)
>>     CalledProcessError: Command '['git', 'diff', '--name-only',
>> '50a0496a43', '6765ef9']' returned non-zero exit status 1
>> **********************************************************************
>> 1 items had failures:
>>    2 of   2 in __main__.identify_changed_files_from_git_commits
>> ***Test Failed*** 2 failures.
>>
>>
>>
>> On Fri, 24 Jun 2016 at 06:59 Yin Huai <yhuai@databricks.com> wrote:
>>
>>> -1 because of https://issues.apache.org/jira/browse/SPARK-16121
>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D16121&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=9200NP4SpeJSUNrSrlWWEC7vFvjWSyCHnx5LD7Sj9u4&e=>.
>>>
>>>
>>> This jira was resolved after 2.0.0-RC1 was cut. Without the fix, Spark
>>> SQL effectively only uses the driver to list files when loading datasets
>>> and the driver-side file listing is very slow for datasets having many
>>> files and partitions. Since this bug causes a serious performance
>>> regression, I am giving -1.
>>>
>>> On Thu, Jun 23, 2016 at 1:25 AM, Pete Robbins <robbinspg@gmail.com>
>>> wrote:
>>>
>>>> I'm also seeing some of these same failures:
>>>>
>>>> - spilling with compression *** FAILED ***
>>>> I have seen this occassionaly
>>>>
>>>> - to UTC timestamp *** FAILED ***
>>>> This was fixed yesterday in branch-2.0 (
>>>> https://issues.apache.org/jira/browse/SPARK-16078
>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SPARK-2D16078&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=SuVdXUNGdAhYgtA2fMLe5vZ2PFrPOaeO3i3cbhYU4tc&e=>)
>>>>
>>>>
>>>> - offset recovery *** FAILED ***
>>>> Haven't seen this for a while and thought the flaky test was fixed but
>>>> it popped up again in one of our builds.
>>>>
>>>> StateStoreSuite:
>>>> - maintenance *** FAILED ***
>>>> Just seen this has been failing for last 2 days on one build machine
>>>> (linux amd64)
>>>>
>>>>
>>>> On 23 June 2016 at 08:51, Sean Owen <sowen@cloudera.com> wrote:
>>>>
>>>>> First pass of feedback on the RC: all the sigs, hashes, etc are fine.
>>>>> Licensing is up to date to the best of my knowledge.
>>>>>
>>>>> I'm hitting test failures, some of which may be spurious. Just putting
>>>>> them out there to see if they ring bells. This is Java 8 on Ubuntu 16.
>>>>>
>>>>>
>>>>> - spilling with compression *** FAILED ***
>>>>>   java.lang.Exception: Test failed with compression using codec
>>>>> org.apache.spark.io.SnappyCompressionCodec:
>>>>> assertion failed: expected cogroup to spill, but did not
>>>>>   at scala.Predef$.assert(Predef.scala:170)
>>>>>   at org.apache.spark.TestUtils$.assertSpilled(TestUtils.scala:170)
>>>>>   at org.apache.spark.util.collection.ExternalAppendOnlyMapSuite.org
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__org.apache.spark.util.collection.ExternalAppendOnlyMapSuite.org&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=goarAptcJYfLg44f7BAwhbipqJlRFKz9Y6Z36HItiKg&e=>
>>>>> $apache$spark$util$collection$ExternalAppendOnlyMapSuite$$testSimpleSpilling(ExternalAppendOnlyMapSuite.scala:263)
>>>>> ...
>>>>>
>>>>> I feel like I've seen this before, and see some possibly relevant
>>>>> fixes, but they're in 2.0.0 already:
>>>>> https://github.com/apache/spark/pull/10990
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_spark_pull_10990&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=dFymYD9NRVHIJ5MKpmzPcH_NYwLjOWcZd7FUuQBpTUU&e=>
>>>>> Is this something where a native library needs to be installed or
>>>>> something?
>>>>>
>>>>>
>>>>> - to UTC timestamp *** FAILED ***
>>>>>   "2016-03-13 [02]:00:00.0" did not equal "2016-03-13 [10]:00:00.0"
>>>>> (DateTimeUtilsSuite.scala:506)
>>>>>
>>>>> I know, we talked about this for the 1.6.2 RC, but I reproduced this
>>>>> locally too. I will investigate, could still be spurious.
>>>>>
>>>>>
>>>>> StateStoreSuite:
>>>>> - maintenance *** FAILED ***
>>>>>   The code passed to eventually never returned normally. Attempted 627
>>>>> times over 10.000180116 seconds. Last failure message:
>>>>> StateStoreSuite.this.fileExists(provider, 1L, false) was true earliest
>>>>> file not deleted. (StateStoreSuite.scala:395)
>>>>>
>>>>> No idea.
>>>>>
>>>>>
>>>>> - offset recovery *** FAILED ***
>>>>>   The code passed to eventually never returned normally. Attempted 197
>>>>> times over 10.040864806 seconds. Last failure message:
>>>>> strings.forall({
>>>>>     ((x$1: Any) => DirectKafkaStreamSuite.collectedData.contains(x$1))
>>>>>   }) was false. (DirectKafkaStreamSuite.scala:250)
>>>>>
>>>>> Also something that was possibly fixed already for 2.0.0 and that I
>>>>> just back-ported into 1.6. Could be just a very similar failure.
>>>>>
>>>>> On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin <rxin@databricks.com>
>>>>> wrote:
>>>>> > Please vote on releasing the following candidate as Apache Spark
>>>>> version
>>>>> > 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT
and
>>>>> passes
>>>>> > if a majority of at least 3+1 PMC votes are cast.
>>>>> >
>>>>> > [ ] +1 Release this package as Apache Spark 2.0.0
>>>>> > [ ] -1 Do not release this package because ...
>>>>> >
>>>>> >
>>>>> > The tag to be voted on is v2.0.0-rc1
>>>>> > (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>>>>> >
>>>>> > This release candidate resolves ~2400 issues:
>>>>> > https://s.apache.org/spark-2.0.0-rc1-jira
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__s.apache.org_spark-2D2.0.0-2Drc1-2Djira&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=ZD_PezvsJ1GyDhv7MhaeUrVba_uhED5mPkqKpfenKEE&e=>
>>>>> >
>>>>> > The release files, including signatures, digests, etc. can be found
>>>>> at:
>>>>> >
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__people.apache.org_-7Epwendell_spark-2Dreleases_spark-2D2.0.0-2Drc1-2Dbin_&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=wSbzZ2LyuDcNKaCijEPdt9rokQ0R9w66tn2jMfjKN2I&e=>
>>>>> >
>>>>> > Release artifacts are signed with the following key:
>>>>> > https://people.apache.org/keys/committer/pwendell.asc
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__people.apache.org_keys_committer_pwendell.asc&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=i1Uxw1NyUf2iuA3CXbyiEODD1RR24rAXUvkc42ut8Ao&e=>
>>>>> >
>>>>> > The staging repository for this release can be found at:
>>>>> >
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1187/
>>>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__repository.apache.org_content_repositories_orgapachespark-2D1187_&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=QjsvnxXe6JBQqXwKw6r-fIIHI9E0ugeeICAqjRXRNwc&e=>
>>>>> >
>>>>> > The documentation corresponding to this release can be found at:
>>>>> >
>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>>>>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__people.apache.org_-7Epwendell_spark-2Dreleases_spark-2D2.0.0-2Drc1-2Ddocs_&d=DQMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=Y3d-oJvw2gK_2KXYjXY8_yzfAosPOqqaV4wtMg6ZPwM&s=_6IZExLgc8WoxW0kft_weR7AvELgbFXnHZdezQ_IYGk&e=>
>>>>> >
>>>>> >
>>>>> > =======================================
>>>>> > == How can I help test this release? ==
>>>>> > =======================================
>>>>> > If you are a Spark user, you can help us test this release by taking
>>>>> an
>>>>> > existing Spark workload and running on this release candidate, then
>>>>> > reporting any regressions from 1.x.
>>>>> >
>>>>> > ================================================
>>>>> > == What justifies a -1 vote for this release? ==
>>>>> > ================================================
>>>>> > Critical bugs impacting major functionalities.
>>>>> >
>>>>> > Bugs already present in 1.x, missing features, or bugs related to
new
>>>>> > features will not necessarily block this release. Note that
>>>>> historically
>>>>> > Spark documentation has been published on the website separately
>>>>> from the
>>>>> > main release so we do not need to block the release due to
>>>>> documentation
>>>>> > errors either.
>>>>> >
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>

Mime
View raw message