spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hyukjin Kwon <gurwls...@gmail.com>
Subject Re: [VOTE] Apache Spark 2.2.0 (RC1)
Date Sat, 29 Apr 2017 06:14:32 GMT
SPARK-20364 <https://issues.apache.org/jira/browse/SPARK-20364> describes a
bug but I am unclear that we should call it a regression that blocks a
release.

It is something working incorrectly (in some cases in terms of output) but
this case looks not even working so far in the past releases.

The current master produces a wrong result when there are dots in column
names for Parquet in some cases, which did even work in past releases.


So, this looks not a regression to me although it is a bug that definitely
we should fix.


In more details, I tested this cases as below:


Spark 1.6.3

val path = "/tmp/foo"
Seq(Tuple1(Some(1)), Tuple1(None)).toDF("col.dots").write.parquet(path)
sqlContext.read.parquet(path).where("`col.dots` IS NOT NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...

sqlContext.read.parquet(path).where("`col.dots` IS NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.0.2

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...

spark.read.parquet(path).where("`col.dots` IS NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.1.0

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...

spark.read.parquet(path).where("`col.dots` IS NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Spark 2.1.1 RC4

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...

spark.read.parquet(path).where("`col.dots` IS NULL").show()

java.lang.IllegalArgumentException: Column [col, dots] was not found in schema!
    at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:55)
    ...


Current master

val path = "/tmp/foo"
Seq(Some(1), None).toDF("col.dots").write.parquet(path)
spark.read.parquet(path).where("`col.dots` IS NOT NULL").show()

+--------+
|col.dots|
+--------+
+--------+

spark.read.parquet(path).where("`col.dots` IS NULL").show()

+--------+
|col.dots|
+--------+
|    null|
+--------+

‚Äč


2017-04-29 2:57 GMT+09:00 Koert Kuipers <koert@tresata.com>:

> we have been testing the 2.2.0 snapshots in the last few weeks for inhouse
> unit tests, integration tests and real workloads and we are very happy with
> it. the only issue i had so far (some encoders not being serialize anymore)
> has already been dealt with by wenchen.
>
> On Thu, Apr 27, 2017 at 6:49 PM, Sean Owen <sowen@cloudera.com> wrote:
>
>> By the way the RC looks good. Sigs and license are OK, tests pass with
>> -Phive -Pyarn -Phadoop-2.7. +1 from me.
>>
>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <michael@databricks.com>
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST
>>> and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.2.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.2.0-rc1
>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c
>>> 1a8f8966c7e64010cf5632cb6)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1235/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.2.0?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.3.0 or 2.2.1.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.1.
>>>
>>
>

Mime
View raw message