Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F175018BDA for ; Tue, 22 Dec 2015 23:37:03 +0000 (UTC) Received: (qmail 98286 invoked by uid 500); 22 Dec 2015 23:37:02 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 98175 invoked by uid 500); 22 Dec 2015 23:37:02 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 98164 invoked by uid 99); 22 Dec 2015 23:37:01 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2015 23:37:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 12E3BC083B for ; Tue, 22 Dec 2015 23:37:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.002 X-Spam-Level: *** X-Spam-Status: No, score=3.002 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=3, HTML_OBFUSCATE_05_10=0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=clearstorydata-com.20150623.gappssmtp.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id SwlAIkABq5PX for ; Tue, 22 Dec 2015 23:36:50 +0000 (UTC) Received: from mail-wm0-f47.google.com (mail-wm0-f47.google.com [74.125.82.47]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 9F99C2059E for ; Tue, 22 Dec 2015 23:36:50 +0000 (UTC) Received: by mail-wm0-f47.google.com with SMTP id l126so130396477wml.1 for ; Tue, 22 Dec 2015 15:36:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=clearstorydata-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=ZdQpd+hKiBGz1BoZbuA2KEBlZCEBfoGABV7du2jof/4=; b=OkKD5+8og4JUy8qrkymHuSnYTFjXeTWnkikwRBsWlKE7olM0CgOrpUQIXH6HwB8I23 xdbrrAVY3Yn451JC4+EZNNDBrKbHsCXJK/EoGqpQyn2EUT0nR+y1GHP0dMI1WtYPz9+/ KAqAU7ljNI/w7tN5etcp92AnzNqz8hXiDDOwFAH1yAdEASe4kvpQek7HBph1Y6UKGTgl UJzL8TrtGJ1ddSnQd7tcjsU4we3gH4nfKOt8+xifsRiNPwUP0OwCyeL3B9+YzHzrbIVB PT1YooYidAOctXqBqCVh6HDbwa8Yi0b9p+GsiU1Bf1BDip97dxWmek848jpRBOImbXoN m2kw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=ZdQpd+hKiBGz1BoZbuA2KEBlZCEBfoGABV7du2jof/4=; b=GZ6V93WV9qegCHeyT1tm/1EscHDz+/2Fcr+yt4zxHts+WThQoVu/gJwkjDYHs5nQls MJ3ZUFcRH8MT+7mwTrqTYCaQmz1YAmb1UT2csFaGPzMtEbrXy7pQGbNV/peiHdNkglUJ Wjj9EzQ5kl+Gj5P/w/E1SMBLBPYNhI7V6ET4H1aFNGDF8dfMLFwp2OLZkqWb8VOs1mMA hvcE3Y1gFtqv0fESo5j6P4Vs6EOqAbqkO2h/FDqKZQkFE9zwaFUv+KFCgPWW9qKKv8tr Hkctye2oynr6wxVU/G+ybRzelZM7DkQ0NckksvLCVZL/giWgsy1TU7zbF10OEEBTqEAI X7UA== X-Gm-Message-State: ALoCoQmlxgafaFhTb0CvUnRl40P8uS6rutiBwVBXuHiRQQ/Y/t9xNZlQpGmWMNJYaxajqWt7gWr/mYC56BOqXFq29QxEVtQDzfOgIjwSyrspQ8mnkwsAbK0= MIME-Version: 1.0 X-Received: by 10.28.49.65 with SMTP id x62mr31036623wmx.49.1450827410237; Tue, 22 Dec 2015 15:36:50 -0800 (PST) Received: by 10.194.249.40 with HTTP; Tue, 22 Dec 2015 15:36:50 -0800 (PST) In-Reply-To: References: Date: Tue, 22 Dec 2015 15:36:50 -0800 Message-ID: Subject: Re: [VOTE] Release Apache Spark 1.6.0 (RC4) From: Mark Hamstra To: Michael Armbrust Cc: "dev@spark.apache.org" Content-Type: multipart/alternative; boundary=001a1142e9c26b047505278514e1 --001a1142e9c26b047505278514e1 Content-Type: text/plain; charset=UTF-8 +1 On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust wrote: > Please vote on releasing the following candidate as Apache Spark version > 1.6.0! > > The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes > if a majority of at least 3 +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Spark 1.6.0 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is *v1.6.0-rc4 > (4062cda3087ae42c6c3cb24508fc1d3a931accdf) > * > > The release files, including signatures, digests, etc. can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/ > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1176/ > > The test repository (versioned as v1.6.0-rc4) for this release can be > found at: > https://repository.apache.org/content/repositories/orgapachespark-1175/ > > The documentation corresponding to this release can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/ > > ======================================= > == How can I help test this release? == > ======================================= > If you are a Spark user, you can help us test this release by taking an > existing Spark workload and running on this release candidate, then > reporting any regressions. > > ================================================ > == What justifies a -1 vote for this release? == > ================================================ > This vote is happening towards the end of the 1.6 QA period, so -1 votes > should only occur for significant regressions from 1.5. Bugs already > present in 1.5, minor regressions, or bugs related to new features will not > block this release. > > =============================================================== > == What should happen to JIRA tickets still targeting 1.6.0? == > =============================================================== > 1. It is OK for documentation patches to target 1.6.0 and still go into > branch-1.6, since documentations will be published separately from the > release. > 2. New features for non-alpha-modules should target 1.7+. > 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target > version. > > > ================================================== > == Major changes to help you focus your testing == > ================================================== > > Notable changes since 1.6 RC3 > > - SPARK-12404 - Fix serialization error for Datasets with > Timestamps/Arrays/Decimal > - SPARK-12218 - Fix incorrect pushdown of filters to parquet > - SPARK-12395 - Fix join columns of outer join for DataFrame using > - SPARK-12413 - Fix mesos HA > > Notable changes since 1.6 RC2 > - SPARK_VERSION has been set correctly > - SPARK-12199 ML Docs are publishing correctly > - SPARK-12345 Mesos cluster mode has been fixed > > Notable changes since 1.6 RC1 > Spark Streaming > > - SPARK-2629 > trackStateByKey has been renamed to mapWithState > > Spark SQL > > - SPARK-12165 > SPARK-12189 Fix > bugs in eviction of storage memory by execution. > - SPARK-12258 correct > passing null into ScalaUDF > > Notable Features Since 1.5Spark SQL > > - SPARK-11787 Parquet > Performance - Improve Parquet scan performance when using flat schemas. > - SPARK-10810 > Session Management - Isolated devault database (i.e USE mydb) even on > shared clusters. > - SPARK-9999 Dataset > API - A type-safe API (similar to RDDs) that performs many operations > on serialized binary data and code generation (i.e. Project Tungsten). > - SPARK-10000 Unified > Memory Management - Shared memory for execution and caching instead of > exclusive division of the regions. > - SPARK-11197 SQL > Queries on Files - Concise syntax for running SQL queries over files > of any supported format without registering a table. > - SPARK-11745 Reading > non-standard JSON files - Added options to read non-standard JSON > files (e.g. single-quotes, unquoted attributes) > - SPARK-10412 Per-operator > Metrics for SQL Execution - Display statistics on a peroperator basis > for memory usage and spilled data size. > - SPARK-11329 Star > (*) expansion for StructTypes - Makes it easier to nest and unest > arbitrary numbers of columns > - SPARK-10917 , > SPARK-11149 In-memory > Columnar Cache Performance - Significant (up to 14x) speed up when > caching data that contains complex types in DataFrames or SQL. > - SPARK-11111 Fast > null-safe joins - Joins using null-safe equality (<=>) will now > execute using SortMergeJoin instead of computing a cartisian product. > - SPARK-11389 SQL > Execution Using Off-Heap Memory - Support for configuring query > execution to occur using off-heap memory to avoid GC overhead > - SPARK-10978 Datasource > API Avoid Double Filter - When implemeting a datasource with filter > pushdown, developers can now tell Spark SQL to avoid double evaluating a > pushed-down filter. > - SPARK-4849 Advanced > Layout of Cached Data - storing partitioning and ordering schemes in > In-memory table scan, and adding distributeBy and localSort to DF API > - SPARK-9858 Adaptive > query execution - Intial support for automatically selecting the > number of reducers for joins and aggregations. > - SPARK-9241 Improved > query planner for queries having distinct aggregations - Query plans > of distinct aggregations are more robust when distinct columns have high > cardinality. > > Spark Streaming > > - API Updates > - SPARK-2629 New > improved state management - mapWithState - a DStream transformation > for stateful stream processing, supercedes updateStateByKey in > functionality and performance. > - SPARK-11198 Kinesis > record deaggregation - Kinesis streams have been upgraded to use > KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated records. > - SPARK-10891 Kinesis > message handler function - Allows arbitraray function to be applied > to a Kinesis record in the Kinesis receiver before to customize what data > is to be stored in memory. > - SPARK-6328 Python > Streamng Listener API - Get streaming statistics (scheduling > delays, batch processing times, etc.) in streaming. > > > - UI Improvements > - Made failures visible in the streaming tab, in the timelines, > batch list, and batch details page. > - Made output operations visible in the streaming tab as progress > bars. > > MLlibNew algorithms/models > > - SPARK-8518 Survival > analysis - Log-linear model for survival analysis > - SPARK-9834 Normal > equation for least squares - Normal equation solver, providing R-like > model summary statistics > - SPARK-3147 Online > hypothesis testing - A/B testing in the Spark Streaming framework > - SPARK-9930 New > feature transformers - ChiSqSelector, QuantileDiscretizer, SQL > transformer > - SPARK-6517 Bisecting > K-Means clustering - Fast top-down clustering variant of K-Means > > API improvements > > - ML Pipelines > - SPARK-6725 Pipeline > persistence - Save/load for ML Pipelines, with partial coverage of > spark.mlalgorithms > - SPARK-5565 LDA > in ML Pipelines - API for Latent Dirichlet Allocation in ML > Pipelines > - R API > - SPARK-9836 R-like > statistics for GLMs - (Partial) R-like stats for ordinary least > squares via summary(model) > - SPARK-9681 Feature > interactions in R formula - Interaction operator ":" in R formula > - Python API - Many improvements to Python API to approach feature > parity > > Misc improvements > > - SPARK-7685 , > SPARK-9642 Instance > weights for GLMs - Logistic and Linear Regression can take instance > weights > - SPARK-10384 , > SPARK-10385 Univariate > and bivariate statistics in DataFrames - Variance, stddev, > correlations, etc. > - SPARK-10117 LIBSVM > data source - LIBSVM as a SQL data sourceDocumentation improvements > - SPARK-7751 @since > versions - Documentation includes initial version when classes and > methods were added > - SPARK-11337 Testable > example code - Automated testing for code in user guide examples > > Deprecations > > - In spark.mllib.clustering.KMeans, the "runs" parameter has been > deprecated. > - In spark.ml.classification.LogisticRegressionModel and > spark.ml.regression.LinearRegressionModel, the "weights" field has been > deprecated, in favor of the new name "coefficients." This helps > disambiguate from instance (row) weights given to algorithms. > > Changes of behavior > > - spark.mllib.tree.GradientBoostedTrees validationTol has changed > semantics in 1.6. Previously, it was a threshold for absolute change in > error. Now, it resembles the behavior of GradientDescent convergenceTol: > For large errors, it uses relative error (relative to the previous error); > for small errors (< 0.01), it uses absolute error. > - spark.ml.feature.RegexTokenizer: Previously, it did not convert > strings to lowercase before tokenizing. Now, it converts to lowercase by > default, with an option not to. This matches the behavior of the simpler > Tokenizer transformer. > - Spark SQL's partition discovery has been changed to only discover > partition directories that are children of the given path. (i.e. if > path="/my/data/x=1" then x=1 will no longer be considered a partition > but only children of x=1.) This behavior can be overridden by manually > specifying the basePath that partitioning discovery should start with ( > SPARK-11678 ). > - When casting a value of an integral type to timestamp (e.g. casting > a long value to timestamp), the value is treated as being in seconds > instead of milliseconds (SPARK-11724 > ). > - With the improved query planner for queries having distinct > aggregations (SPARK-9241 > ), the plan of a > query having a single distinct aggregation has been changed to a more > robust version. To switch back to the plan generated by Spark 1.5's > planner, please set spark.sql.specializeSingleDistinctAggPlanning to > true (SPARK-12077 ). > > --001a1142e9c26b047505278514e1 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
+1

On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <michael@d= atabricks.com> wrote:
Please vote on releasing the following candidate as Apache Spark versi= on 1.6.0!

The vote is open until Friday, December 25, 2015=C2=A0at= 18:00 UTC and passes if a majority=C2=A0o= f at least 3 +1 PMC votes are cast.

[ ] +1 Releas= e this package as Apache Spark 1.6.0
[ ] -1 Do not release this packag= e because ...

To learn more about Apache Spark, please see= =C2=A0http://spark.a= pache.org/


The release files, including sign= atures, digests, etc. can be found at:
=


The t= est repository (versioned as v1.6.0-rc4) for this release can be found at:<= /font>

The documentation correspon= ding to this release can be found at:

=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
This vote= is happening towards the end of the 1.6 QA period, so -1 votes should only= occur for significant regressions from 1.5. Bugs already present in 1.5, m= inor regressions, or bugs related to new features will not block this relea= se.

=3D=3D Wh= at should happen to JIRA tickets still targeting 1.6.0? =3D=3D
=
=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
1. It is OK for documentation patche= s to target 1.6.0 and still go into branch-1.6, since documentations will b= e published separately from the release.
2. New features for non-alpha= -modules should target 1.7+.
3. Non-blocker bug fixes should target 1.= 6.1 or 1.7.0, or drop the target version.


=
=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
= =3D=3D Major changes to help you focus your testing =3D=3D
=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Notable changes since 1.6 RC3


=C2=A0 - SPARK-1240= 4 - Fix serialization error for Datasets with Timestamps/Arrays/Decimal
=C2=A0 -= SPARK-12218 - Fix incorrect pushdown of filters to parquet
=C2=A0 - SPARK-12395= - Fix join columns of outer join for DataFrame using
=C2=A0 - SPARK-12413 - Fix= mesos HA


<= /h1>

Notable changes since 1.6 RC2

<= font face=3D"monospace, monospace">
- SPARK_VERSION has been set correct= ly
- SPARK-12199=C2=A0ML Docs are publishing correctly
- SPARK-12345 = Mesos cluster mode has been fixed

Notable changes since 1.6 RC1

Spark Str= eaming

Spark SQL

  • SPARK-12165=C2=A0SPARK-12189=C2= =A0Fix bugs in eviction of storage memory by execution.
  • = S= PARK-12258=C2=A0correct passing null into ScalaUDF
  • Notable Features Since 1.5

    Spark SQL

      SPARK-11787=C2=A0Parquet Performance=C2=A0- Improve Parquet scan performance w= hen using flat schemas.
    • SPARK-10810=C2=A0Session=C2=A0Management= =C2=A0- Isolated devault database (i.e=C2=A0USE mydb) even on shared clusters.
    • SPARK-9999=C2=A0=C2=A0Dataset API=C2=A0- A type-safe API (similar to RDDs) tha= t performs many operations on serialized binary data and code generation (i= .e. Project Tungsten).
    • SPARK-10000=C2=A0Unified Memory Management=C2= =A0- Shared memory for execution and caching instead of exclusive division = of the regions.
    • SPARK-11197=C2=A0= SQL Queries on Files=C2=A0- Concise = syntax for running SQL queries over files of any supported format without r= egistering a table.
    • SPARK-11745= =C2=A0Reading non-standard JSON files=C2=A0- Added options to read non-standard JSON files (e.g. single-quotes,= unquoted attributes)
    • SPARK-10412= =C2=A0Per-operator Metrics for SQL Executio= n=C2=A0- Display statistics on a peroperator basis for memory usage = and spilled data size.
    • SPARK-11329=C2=A0Star (*) expansion for StructTypes=C2=A0- Makes it easier to nest and unest arbitrary numbers of columns=
    • SPARK-10917,=C2=A0SPARK-11149= =C2=A0In-memory Columnar Cache Performance<= /span>=C2=A0- Significant (up to 14x) speed up when caching data that conta= ins complex types in DataFrames or SQL.
    • SPARK-11111=C2=A0Fast null-safe join= s=C2=A0- Joins using null-safe equality (<=3D>) will now execute using SortMergeJoin = instead of computing a cartisian product.
    • SPARK-11389=C2=A0SQL Execution U= sing Off-Heap Memory=C2=A0- Support for configuring query execution = to occur using off-heap memory to avoid GC overhead
    • SPARK-10978=C2=A0Datasou= rce API Avoid Double Filter=C2=A0- When implemeting a datasource wit= h filter pushdown, developers can now tell Spark SQL to avoid double evalua= ting a pushed-down filter.
    • SPARK-484= 9=C2=A0=C2=A0Advanced Layout of Cached = Data=C2=A0- storing partitioning and ordering schemes in In-memory t= able scan, and adding distributeBy and localSort to DF API
    • SPARK-9858=C2=A0=C2=A0Adaptive query execution=C2=A0- Intial support for automatically= selecting the number of reducers for joins and aggregations.
    • SPARK-9241=C2=A0=C2=A0Improved query planner for queries having distinct aggregations=C2=A0- Query plans of distinct aggregations are more robust when distinc= t columns have high cardinality.

    Spark Stream= ing

    • API Updates
      • = SPARK-2629=C2=A0=C2=A0New improved state management=C2=A0-=C2=A0mapWithState=C2=A0- a DStream tr= ansformation for stateful stream processing, supercedes=C2=A0updateStateByKey=C2=A0in functionalit= y and performance.
      • SPARK-11198=C2=A0Kinesis record deaggregation=C2=A0- Kinesis streams h= ave been upgraded to use KCL 1.4.0 and supports transparent deaggregation o= f KPL-aggregated records.
      • SPARK-10891=C2=A0Kinesis message handler function=C2=A0- Allows= arbitraray function to be applied to a Kinesis record in the Kinesis recei= ver before to customize what data is to be stored in memory.
      • SPARK-6328=C2=A0=C2=A0Python Streamn= g Listener API=C2=A0- Get streaming statistics (scheduling delays, b= atch processing times, etc.) in streaming.
    • UI Improv= ements
      • Made failures visible in the streaming tab, in the timel= ines, batch list, and batch details page.
      • Made output op= erations visible in the streaming tab as progress bars.

    MLlib

    New algorithms/models

  • SPARK-8518=C2=A0=C2=A0Survival analysis=C2=A0- Log-linear model for s= urvival analysis
  • = SPARK-9834=C2=A0=C2=A0Normal equation for least squares=C2=A0- Normal equ= ation solver, providing R-like model summary statistics
  • = SP= ARK-3147=C2=A0=C2=A0Online hypothesis t= esting=C2=A0- A/B testing in the Spark Streaming framework
  • SPARK-9930=C2=A0=C2=A0New featu= re transformers=C2=A0- ChiSqSelector, QuantileDiscretizer, SQL trans= former
  • SPARK-6517=C2=A0=C2=A0Bisecting K-Means clustering=C2=A0- Fast top-down clustering = variant of K-Means
  • API improvements

    • ML Pipelines=
      • SPARK-6725=C2=A0=C2=A0Pipel= ine persistence=C2=A0- Save/load for ML Pipelines, with partial cove= rage of=C2=A0spark.mlalg= orithms
      • SPARK-5565=C2=A0=C2=A0LDA in ML Pipelines=C2=A0- API for Latent Dirichlet Allocati= on in ML Pipelines
    • R API

      Mi= sc improvements

      • SPARK-768= 5=C2=A0,=C2=A0SPARK-9642=C2=A0=C2=A0I= nstance weights for GLMs=C2=A0- Logistic and Linear Regression can t= ake instance weights
      • SPARK-10384,=C2=A0SPARK-10385= =C2=A0Univariate and bivariate statistics i= n DataFrames=C2=A0- Variance, stddev, correlations, etc.
      • =
      • SPARK-10117=C2=A0LIBSVM data sou= rce=C2=A0- LIBSVM as a SQL data source

        Documentation improvements

      • <= li style=3D"margin-left:15px;line-height:1.6">SPARK-7751=C2=A0=C2=A0@since ver= sions=C2=A0- Documentation includes initial version when classes and= methods were added
      • SPARK-11337=C2=A0Testable example code=C2=A0- Automated testing for c= ode in user guide examples

      Deprecations

      • In spark.mllib.clustering.KM= eans, the "runs" parameter has been deprecated.
      • Changes of behavior
        • spark.ml.feature= .RegexTokenizer: Previously, it did not convert strings to lowercase before= tokenizing. Now, it converts to lowercase by default, with an option not t= o. This matches the behavior of the simpler Tokenizer transformer.
        • Spark SQL's partition discovery has been changed to only dis= cover partition directories that are children of the given path. (i.e. if= =C2=A0path=3D"/my/data/x= =3D1"=C2=A0then=C2=A0x=3D1=C2=A0will no longer be considered a partition but only chil= dren of=C2=A0x=3D1.) Th= is behavior can be overridden by manually specifying the=C2=A0basePath=C2=A0that partitioning di= scovery should start with (SPARK-11678).
        • When casting= a value of an integral type to timestamp (e.g. casting a long value to tim= estamp), the value is treated as being in seconds instead of milliseconds (= S= PARK-11724).
        • = With the improved query planner for que= ries having distinct aggregations (SPARK-9241), the plan of a query havin= g a single distinct aggregation has been changed to a more robust version. = To switch back to the plan generated by Spark 1.5's planner, please set= =C2=A0spark.sql.specializeSingleDistinctAggPlann= ing=C2=A0to=C2=A0true=C2=A0(SPARK-1= 2077).

    --001a1142e9c26b047505278514e1--