Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAAswR-6PxK1DS1VCTRi8WYa8X7XgYfMNuNcNsqUDnFS=jCw-Lg@mail.gmail.com>
References: 
 <CAAswR-6PxK1DS1VCTRi8WYa8X7XgYfMNuNcNsqUDnFS=jCw-Lg@mail.gmail.com>
Date: Tue, 22 Dec 2015 15:36:50 -0800
Message-ID: 
 <CAAsvFPn1imUzsR-kT3F1c=RvjaoFaGikXcYNe8WDefRa-bzKWg@mail.gmail.com>
Subject: Re: [VOTE] Release Apache Spark 1.6.0 (RC4)
From: Mark Hamstra <mark@clearstorydata.com>
To: Michael Armbrust <michael@databricks.com>
Cc: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a1142e9c26b047505278514e1

--001a1142e9c26b047505278514e1
Content-Type: text/plain; charset=UTF-8

+1

On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <michael@databricks.com>
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 1.6.0!
>
> The vote is open until Friday, December 25, 2015 at 18:00 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.6.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is *v1.6.0-rc4
> (4062cda3087ae42c6c3cb24508fc1d3a931accdf)
> <https://github.com/apache/spark/tree/v1.6.0-rc4>*
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1176/
>
> The test repository (versioned as v1.6.0-rc4) for this release can be
> found at:
> https://repository.apache.org/content/repositories/orgapachespark-1175/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/
>
> =======================================
> == How can I help test this release? ==
> =======================================
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> ================================================
> == What justifies a -1 vote for this release? ==
> ================================================
> This vote is happening towards the end of the 1.6 QA period, so -1 votes
> should only occur for significant regressions from 1.5. Bugs already
> present in 1.5, minor regressions, or bugs related to new features will not
> block this release.
>
> ===============================================================
> == What should happen to JIRA tickets still targeting 1.6.0? ==
> ===============================================================
> 1. It is OK for documentation patches to target 1.6.0 and still go into
> branch-1.6, since documentations will be published separately from the
> release.
> 2. New features for non-alpha-modules should target 1.7+.
> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the target
> version.
>
>
> ==================================================
> == Major changes to help you focus your testing ==
> ==================================================
>
> Notable changes since 1.6 RC3
>
>   - SPARK-12404 - Fix serialization error for Datasets with
> Timestamps/Arrays/Decimal
>   - SPARK-12218 - Fix incorrect pushdown of filters to parquet
>   - SPARK-12395 - Fix join columns of outer join for DataFrame using
>   - SPARK-12413 - Fix mesos HA
>
> Notable changes since 1.6 RC2
> - SPARK_VERSION has been set correctly
> - SPARK-12199 ML Docs are publishing correctly
> - SPARK-12345 Mesos cluster mode has been fixed
>
> Notable changes since 1.6 RC1
> Spark Streaming
>
>    - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629>
>    trackStateByKey has been renamed to mapWithState
>
> Spark SQL
>
>    - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165>
>    SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix
>    bugs in eviction of storage memory by execution.
>    - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> correct
>    passing null into ScalaUDF
>
> Notable Features Since 1.5Spark SQL
>
>    - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> Parquet
>    Performance - Improve Parquet scan performance when using flat schemas.
>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>    Session Management - Isolated devault database (i.e USE mydb) even on
>    shared clusters.
>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> Dataset
>    API - A type-safe API (similar to RDDs) that performs many operations
>    on serialized binary data and code generation (i.e. Project Tungsten).
>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> Unified
>    Memory Management - Shared memory for execution and caching instead of
>    exclusive division of the regions.
>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>    Queries on Files - Concise syntax for running SQL queries over files
>    of any supported format without registering a table.
>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> Reading
>    non-standard JSON files - Added options to read non-standard JSON
>    files (e.g. single-quotes, unquoted attributes)
>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> Per-operator
>    Metrics for SQL Execution - Display statistics on a peroperator basis
>    for memory usage and spilled data size.
>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>    (*) expansion for StructTypes - Makes it easier to nest and unest
>    arbitrary numbers of columns
>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>    SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> In-memory
>    Columnar Cache Performance - Significant (up to 14x) speed up when
>    caching data that contains complex types in DataFrames or SQL.
>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>    null-safe joins - Joins using null-safe equality (<=>) will now
>    execute using SortMergeJoin instead of computing a cartisian product.
>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>    Execution Using Off-Heap Memory - Support for configuring query
>    execution to occur using off-heap memory to avoid GC overhead
>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> Datasource
>    API Avoid Double Filter - When implemeting a datasource with filter
>    pushdown, developers can now tell Spark SQL to avoid double evaluating a
>    pushed-down filter.
>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> Advanced
>    Layout of Cached Data - storing partitioning and ordering schemes in
>    In-memory table scan, and adding distributeBy and localSort to DF API
>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> Adaptive
>    query execution - Intial support for automatically selecting the
>    number of reducers for joins and aggregations.
>    - SPARK-9241  <https://issues.apache.org/jira/browse/SPARK-9241> Improved
>    query planner for queries having distinct aggregations - Query plans
>    of distinct aggregations are more robust when distinct columns have high
>    cardinality.
>
> Spark Streaming
>
>    - API Updates
>       - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629> New
>       improved state management - mapWithState - a DStream transformation
>       for stateful stream processing, supercedes updateStateByKey in
>       functionality and performance.
>       - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis
>       record deaggregation - Kinesis streams have been upgraded to use
>       KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated records.
>       - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis
>       message handler function - Allows arbitraray function to be applied
>       to a Kinesis record in the Kinesis receiver before to customize what data
>       is to be stored in memory.
>       - SPARK-6328  <https://issues.apache.org/jira/browse/SPARK-6328> Python
>       Streamng Listener API - Get streaming statistics (scheduling
>       delays, batch processing times, etc.) in streaming.
>
>
>    - UI Improvements
>       - Made failures visible in the streaming tab, in the timelines,
>       batch list, and batch details page.
>       - Made output operations visible in the streaming tab as progress
>       bars.
>
> MLlibNew algorithms/models
>
>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> Survival
>    analysis - Log-linear model for survival analysis
>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>    equation for least squares - Normal equation solver, providing R-like
>    model summary statistics
>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> Online
>    hypothesis testing - A/B testing in the Spark Streaming framework
>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>    transformer
>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> Bisecting
>    K-Means clustering - Fast top-down clustering variant of K-Means
>
> API improvements
>
>    - ML Pipelines
>       - SPARK-6725  <https://issues.apache.org/jira/browse/SPARK-6725> Pipeline
>       persistence - Save/load for ML Pipelines, with partial coverage of
>       spark.mlalgorithms
>       - SPARK-5565  <https://issues.apache.org/jira/browse/SPARK-5565> LDA
>       in ML Pipelines - API for Latent Dirichlet Allocation in ML
>       Pipelines
>    - R API
>       - SPARK-9836  <https://issues.apache.org/jira/browse/SPARK-9836> R-like
>       statistics for GLMs - (Partial) R-like stats for ordinary least
>       squares via summary(model)
>       - SPARK-9681  <https://issues.apache.org/jira/browse/SPARK-9681> Feature
>       interactions in R formula - Interaction operator ":" in R formula
>    - Python API - Many improvements to Python API to approach feature
>    parity
>
> Misc improvements
>
>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>,
>    SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> Instance
>    weights for GLMs - Logistic and Linear Regression can take instance
>    weights
>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>    SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> Univariate
>    and bivariate statistics in DataFrames - Variance, stddev,
>    correlations, etc.
>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM
>    data source - LIBSVM as a SQL data sourceDocumentation improvements
>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> @since
>    versions - Documentation includes initial version when classes and
>    methods were added
>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> Testable
>    example code - Automated testing for code in user guide examples
>
> Deprecations
>
>    - In spark.mllib.clustering.KMeans, the "runs" parameter has been
>    deprecated.
>    - In spark.ml.classification.LogisticRegressionModel and
>    spark.ml.regression.LinearRegressionModel, the "weights" field has been
>    deprecated, in favor of the new name "coefficients." This helps
>    disambiguate from instance (row) weights given to algorithms.
>
> Changes of behavior
>
>    - spark.mllib.tree.GradientBoostedTrees validationTol has changed
>    semantics in 1.6. Previously, it was a threshold for absolute change in
>    error. Now, it resembles the behavior of GradientDescent convergenceTol:
>    For large errors, it uses relative error (relative to the previous error);
>    for small errors (< 0.01), it uses absolute error.
>    - spark.ml.feature.RegexTokenizer: Previously, it did not convert
>    strings to lowercase before tokenizing. Now, it converts to lowercase by
>    default, with an option not to. This matches the behavior of the simpler
>    Tokenizer transformer.
>    - Spark SQL's partition discovery has been changed to only discover
>    partition directories that are children of the given path. (i.e. if
>    path="/my/data/x=1" then x=1 will no longer be considered a partition
>    but only children of x=1.) This behavior can be overridden by manually
>    specifying the basePath that partitioning discovery should start with (
>    SPARK-11678 <https://issues.apache.org/jira/browse/SPARK-11678>).
>    - When casting a value of an integral type to timestamp (e.g. casting
>    a long value to timestamp), the value is treated as being in seconds
>    instead of milliseconds (SPARK-11724
>    <https://issues.apache.org/jira/browse/SPARK-11724>).
>    - With the improved query planner for queries having distinct
>    aggregations (SPARK-9241
>    <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a
>    query having a single distinct aggregation has been changed to a more
>    robust version. To switch back to the plan generated by Spark 1.5's
>    planner, please set spark.sql.specializeSingleDistinctAggPlanning to
>    true (SPARK-12077 <https://issues.apache.org/jira/browse/SPARK-12077>).
>
>

--001a1142e9c26b047505278514e1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">+1</div><div class=3D"gmail_extra"><br><div class=3D"gmail=
_quote">On Tue, Dec 22, 2015 at 12:10 PM, Michael Armbrust <span dir=3D"ltr=
">&lt;<a href=3D"mailto:michael@databricks.com" target=3D"_blank">michael@d=
atabricks.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div =
dir=3D"ltr"><div style=3D"font-size:12.8px"><font face=3D"monospace, monosp=
ace">Please vote on releasing the following candidate as Apache Spark versi=
on 1.6.0!</font></div><div style=3D"font-size:12.8px"><font face=3D"monospa=
ce, monospace"><br>The vote is open until Friday, December 25, 2015=C2=A0at=
 18<span style=3D"font-size:12.8px">:00 UTC and passes if a majority=C2=A0o=
f at least 3 +1 PMC votes are cast.</span><br></font></div><div style=3D"fo=
nt-size:12.8px"><font face=3D"monospace, monospace"><br></font></div><div s=
tyle=3D"font-size:12.8px"><font face=3D"monospace, monospace">[ ] +1 Releas=
e this package as Apache Spark 1.6.0</font></div><div style=3D"font-size:12=
.8px"><font face=3D"monospace, monospace">[ ] -1 Do not release this packag=
e because ...</font></div><div style=3D"font-size:12.8px"><font face=3D"mon=
ospace, monospace"><br></font></div><div style=3D"font-size:12.8px"><font f=
ace=3D"monospace, monospace">To learn more about Apache Spark, please see=
=C2=A0<a href=3D"http://spark.apache.org/" target=3D"_blank">http://spark.a=
pache.org/</a></font></div><div style=3D"font-size:12.8px"><font face=3D"mo=
nospace, monospace"><br></font></div><div style=3D"font-size:12.8px"><font =
face=3D"monospace, monospace">The tag to be voted on is=C2=A0<u style=3D"fo=
nt-size:12.8px;color:rgb(17,85,204)"><a href=3D"https://github.com/apache/s=
park/tree/v1.6.0-rc4" target=3D"_blank">v1.6.0-rc4 (4062cda3087ae42c6c3cb24=
508fc1d3a931accdf)</a></u></font></div><div style=3D"font-size:12.8px"><fon=
t face=3D"monospace, monospace"><br></font></div><div style=3D"font-size:12=
.8px"><font face=3D"monospace, monospace">The release files, including sign=
atures, digests, etc. can be found at:</font></div><div style=3D"font-size:=
12.8px"><a href=3D"http://people.apache.org/~pwendell/spark-releases/spark-=
1.6.0-rc4-bin/" target=3D"_blank"><font face=3D"monospace, monospace">http:=
//people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-bin/</font></a=
></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=
<br></font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, m=
onospace">Release artifacts are signed with the following key:</font></div>=
<div style=3D"font-size:12.8px"><a href=3D"https://people.apache.org/keys/c=
ommitter/pwendell.asc" target=3D"_blank"><font face=3D"monospace, monospace=
">https://people.apache.org/keys/committer/pwendell.asc</font></a></div><di=
v style=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></font=
></div><div style=3D"font-size:12.8px"><div style=3D"font-size:12.8px"><fon=
t face=3D"monospace, monospace">The staging repository for this release can=
 be found at:</font></div><div style=3D"font-size:12.8px"><a href=3D"https:=
//repository.apache.org/content/repositories/orgapachespark-1176/" target=
=3D"_blank"><font face=3D"monospace, monospace">https://repository.apache.o=
rg/content/repositories/orgapachespark-1176/</font></a></div></div><div sty=
le=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></font></di=
v><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">The t=
est repository (versioned as v1.6.0-rc4) for this release can be found at:<=
/font></div><div style=3D"font-size:12.8px"><a href=3D"https://repository.a=
pache.org/content/repositories/orgapachespark-1175/" target=3D"_blank"><fon=
t face=3D"monospace, monospace">https://repository.apache.org/content/repos=
itories/orgapachespark-1175/</font></a></div><div style=3D"font-size:12.8px=
"><font face=3D"monospace, monospace"><br></font></div><div style=3D"font-s=
ize:12.8px"><font face=3D"monospace, monospace">The documentation correspon=
ding to this release can be found at:</font></div><div style=3D"font-size:1=
2.8px"><a href=3D"http://people.apache.org/~pwendell/spark-releases/spark-1=
.6.0-rc4-docs/" target=3D"_blank"><font face=3D"monospace, monospace">http:=
//people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc4-docs/</font></=
a></div><div style=3D"font-size:12.8px"><br></div><div style=3D"font-size:1=
2.8px"><font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D</font></div><div style=3D"font-size:12.8px"><font face=3D"mono=
space, monospace">=3D=3D How can I help test this release? =3D=3D</font></d=
iv><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size=
:12.8px"><font face=3D"monospace, monospace">If you are a Spark user, you c=
an help us test this release by taking an existing Spark workload and runni=
ng on this release candidate, then reporting any regressions.</font></div><=
div style=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></fo=
nt></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace=
">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</f=
ont></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospac=
e">=3D=3D What justifies a -1 vote for this release? =3D=3D</font></div><di=
v style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><d=
iv style=3D"font-size:12.8px"><font face=3D"monospace, monospace">This vote=
 is happening towards the end of the 1.6 QA period, so -1 votes should only=
 occur for significant regressions from 1.5. Bugs already present in 1.5, m=
inor regressions, or bugs related to new features will not block this relea=
se.</font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, mo=
nospace"><br></font></div><div style=3D"font-size:12.8px"><font face=3D"mon=
ospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><d=
iv style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D Wh=
at should happen to JIRA tickets still targeting 1.6.0? =3D=3D</font></div>=
<div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size:12.8p=
x"><font face=3D"monospace, monospace">1. It is OK for documentation patche=
s to target 1.6.0 and still go into branch-1.6, since documentations will b=
e published separately from the release.</font></div><div style=3D"font-siz=
e:12.8px"><font face=3D"monospace, monospace">2. New features for non-alpha=
-modules should target 1.7+.</font></div><div style=3D"font-size:12.8px"><f=
ont face=3D"monospace, monospace">3. Non-blocker bug fixes should target 1.=
6.1 or 1.7.0, or drop the target version.</font></div><div style=3D"font-si=
ze:12.8px"><font face=3D"monospace, monospace"><br></font></div><div style=
=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></font></div>=
<div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font=
></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=
=3D=3D Major changes to help you focus your testing =3D=3D</font></div><div=
 style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></=
div><div style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8p=
x"><h1 style=3D"font-size:12.8px;margin:0px;line-height:1.2;color:rgb(51,51=
,51)"><span style=3D"font-weight:normal;font-family:monospace,monospace;fon=
t-size:small;line-height:1.225">Notable changes since 1.6 RC3</span></h1><d=
iv><span style=3D"font-weight:normal;font-size:small;line-height:1.225"><fo=
nt face=3D"monospace, monospace"><br></font></span></div><div><font face=3D=
"monospace, monospace"><span style=3D"font-size:12.8px">=C2=A0 - SPARK-1240=
4 - Fix serialization error for Datasets with Timestamps/Arrays/Decimal</sp=
an><br style=3D"font-size:12.8px"><span style=3D"font-size:12.8px">=C2=A0 -=
 SPARK-12218 - Fix incorrect pushdown of filters to parquet</span><br style=
=3D"font-size:12.8px"><span style=3D"font-size:12.8px">=C2=A0 - SPARK-12395=
 - Fix join columns of outer join for DataFrame using</span><br style=3D"fo=
nt-size:12.8px"><span style=3D"font-size:12.8px">=C2=A0 - SPARK-12413 - Fix=
 mesos HA</span><span style=3D"font-weight:normal;font-size:small;line-heig=
ht:1.225"><br></span></font></div><h1 style=3D"font-size:12.8px;margin:0px;=
line-height:1.2;color:rgb(51,51,51)"><span style=3D"font-weight:normal;font=
-family:monospace,monospace;font-size:small;line-height:1.225"><br></span><=
/h1><h1 style=3D"font-size:12.8px;margin:0px;line-height:1.2;color:rgb(51,5=
1,51)"><span style=3D"font-weight:normal;font-family:monospace,monospace;fo=
nt-size:small;line-height:1.225">Notable changes since 1.6 RC2</span></h1><=
font face=3D"monospace, monospace"><br>- SPARK_VERSION has been set correct=
ly<br>- SPARK-12199=C2=A0ML Docs are publishing correctly<br>- SPARK-12345 =
Mesos cluster mode has been fixed</font><div><span style=3D"font-size:12.8p=
x;font-family:monospace,monospace"><br></span></div><h1 style=3D"font-size:=
12.8px;margin:0px;line-height:1.2;color:rgb(51,51,51)"><span style=3D"font-=
weight:normal;font-family:monospace,monospace;font-size:small;line-height:1=
.225">Notable changes since 1.6 RC1</span><br></h1><h3 style=3D"font-size:1=
2.8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,51)"><font face=
=3D"monospace, monospace" size=3D"2" style=3D"font-weight:normal">Spark Str=
eaming</font></h3><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px =
10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:=
15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https:=
//issues.apache.org/jira/browse/SPARK-2629" style=3D"color:rgb(71,135,242);=
text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-2629=C2=A0</a=
>=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;bord=
er:none;margin:0px;line-height:1.6;background:none">trackStateByKey</code>=
=C2=A0has been renamed to=C2=A0<code style=3D"padding:0px;border-radius:0px=
;white-space:nowrap;border:none;margin:0px;line-height:1.6;background:none"=
>mapWithState</code></font></li></ul><h3 style=3D"font-size:12.8px;margin:1=
6px 0px 0px;line-height:1.43;color:rgb(51,51,51)"><font face=3D"monospace, =
monospace" size=3D"2" style=3D"font-weight:normal">Spark SQL</font></h3><ul=
 style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px;line-height=
:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px;line-height:1.6"=
><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/j=
ira/browse/SPARK-12165" style=3D"color:rgb(71,135,242);text-decoration:none=
;line-height:1.6" target=3D"_blank">SPARK-12165</a>=C2=A0<a href=3D"https:/=
/issues.apache.org/jira/browse/SPARK-12189" style=3D"color:rgb(71,135,242);=
text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-12189</a>=C2=
=A0Fix bugs in eviction of storage memory by execution.</font></li><li styl=
e=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace">=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-12258" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-12258</a>=C2=A0correct passing null into ScalaUDF</font></li></ul><h2 =
style=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.225;color:rgb(5=
1,51,51)"><font face=3D"monospace, monospace" size=3D"2" style=3D"font-weig=
ht:normal">Notable Features Since 1.5</font></h2><h3 style=3D"font-size:12.=
8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,51)"><font face=3D=
"monospace, monospace" size=3D"2" style=3D"font-weight:normal">Spark SQL</f=
ont></h3><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px=
"><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font =
face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/bro=
wse/SPARK-11787" style=3D"color:rgb(71,135,242);text-decoration:none;line-h=
eight:1.6" target=3D"_blank">SPARK-11787</a>=C2=A0<span style=3D"line-heigh=
t:1.6">Parquet Performance</span>=C2=A0- Improve Parquet scan performance w=
hen using flat schemas.</font></li><li style=3D"margin-left:15px"><font fac=
e=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse=
/SPARK-10810" style=3D"color:rgb(71,135,242);line-height:1.6;text-decoratio=
n:none" target=3D"_blank">SPARK-10810</a><font color=3D"#333333"><span styl=
e=3D"line-height:1.6">=C2=A0</span></font><span style=3D"color:rgb(51,51,51=
);line-height:1.6">Session=C2=A0</span><font color=3D"#333333"><span style=
=3D"line-height:20.8px">Management</span><span style=3D"line-height:1.6">=
=C2=A0- Isolated devault database (i.e=C2=A0</span></font><code style=3D"co=
lor:rgb(51,51,51);line-height:1.6;padding:0px;border-radius:0px;white-space=
:nowrap;border:none;margin:0px;background-image:none;background-color:initi=
al;background-repeat:initial">USE mydb</code><font color=3D"#333333"><span =
style=3D"line-height:1.6">) even on shared clusters.</span></font></font></=
li><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font=
 face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/br=
owse/SPARK-9999" style=3D"color:rgb(71,135,242);text-decoration:none;line-h=
eight:1.6" target=3D"_blank">SPARK-9999=C2=A0</a>=C2=A0<span style=3D"line-=
height:1.6">Dataset API</span>=C2=A0- A type-safe API (similar to RDDs) tha=
t performs many operations on serialized binary data and code generation (i=
.e. Project Tungsten).</font></li><li style=3D"margin-left:15px;color:rgb(5=
1,51,51);line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"ht=
tps://issues.apache.org/jira/browse/SPARK-10000" style=3D"color:rgb(71,135,=
242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10000</a=
>=C2=A0<span style=3D"line-height:1.6">Unified Memory Management</span>=C2=
=A0- Shared memory for execution and caching instead of exclusive division =
of the regions.</font></li><li style=3D"margin-left:15px;color:rgb(51,51,51=
);line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://i=
ssues.apache.org/jira/browse/SPARK-11197" style=3D"color:rgb(71,135,242);te=
xt-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11197</a>=C2=A0=
<span style=3D"line-height:1.6">SQL Queries on Files</span>=C2=A0- Concise =
syntax for running SQL queries over files of any supported format without r=
egistering a table.</font></li><li style=3D"margin-left:15px;color:rgb(51,5=
1,51);line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https=
://issues.apache.org/jira/browse/SPARK-11745" style=3D"color:rgb(71,135,242=
);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11745</a>=
=C2=A0<span style=3D"line-height:1.6">Reading non-standard JSON files</span=
>=C2=A0- Added options to read non-standard JSON files (e.g. single-quotes,=
 unquoted attributes)</font></li><li style=3D"margin-left:15px;color:rgb(51=
,51,51);line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"htt=
ps://issues.apache.org/jira/browse/SPARK-10412" style=3D"color:rgb(71,135,2=
42);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10412</a>=
=C2=A0<span style=3D"line-height:1.6">Per-operator Metrics for SQL Executio=
n</span>=C2=A0- Display statistics on a peroperator basis for memory usage =
and spilled data size.</font></li><li style=3D"margin-left:15px;color:rgb(5=
1,51,51);line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"ht=
tps://issues.apache.org/jira/browse/SPARK-11329" style=3D"color:rgb(71,135,=
242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11329</a=
>=C2=A0<span style=3D"line-height:1.6">Star (*) expansion for StructTypes</=
span>=C2=A0- Makes it easier to nest and unest arbitrary numbers of columns=
</font></li><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1=
.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.or=
g/jira/browse/SPARK-10917" style=3D"color:rgb(71,135,242);text-decoration:n=
one;line-height:1.6" target=3D"_blank">SPARK-10917</a>,=C2=A0<a href=3D"htt=
ps://issues.apache.org/jira/browse/SPARK-11149" style=3D"color:rgb(71,135,2=
42);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11149</a>=
=C2=A0<span style=3D"line-height:1.6">In-memory Columnar Cache Performance<=
/span>=C2=A0- Significant (up to 14x) speed up when caching data that conta=
ins complex types in DataFrames or SQL.</font></li><li style=3D"margin-left=
:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospa=
ce"><a href=3D"https://issues.apache.org/jira/browse/SPARK-11111" style=3D"=
color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blan=
k">SPARK-11111</a>=C2=A0<span style=3D"line-height:1.6">Fast null-safe join=
s</span>=C2=A0- Joins using null-safe equality (<code style=3D"padding:0px;=
border-radius:0px;white-space:nowrap;border:none;margin:0px;line-height:1.6=
;background:none">&lt;=3D&gt;</code>) will now execute using SortMergeJoin =
instead of computing a cartisian product.</font></li><li style=3D"margin-le=
ft:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monos=
pace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-11389" style=
=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_=
blank">SPARK-11389</a>=C2=A0<span style=3D"line-height:1.6">SQL Execution U=
sing Off-Heap Memory</span>=C2=A0- Support for configuring query execution =
to occur using off-heap memory to avoid GC overhead</font></li><li style=3D=
"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monosp=
ace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-1097=
8" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" tar=
get=3D"_blank">SPARK-10978</a>=C2=A0<span style=3D"line-height:1.6">Datasou=
rce API Avoid Double Filter</span>=C2=A0- When implemeting a datasource wit=
h filter pushdown, developers can now tell Spark SQL to avoid double evalua=
ting a pushed-down filter.</font></li><li style=3D"margin-left:15px;color:r=
gb(51,51,51);line-height:1.6"><font face=3D"monospace, monospace"><a href=
=3D"https://issues.apache.org/jira/browse/SPARK-4849" style=3D"color:rgb(71=
,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-484=
9=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Advanced Layout of Cached =
Data</span>=C2=A0- storing partitioning and ordering schemes in In-memory t=
able scan, and adding distributeBy and localSort to DF API</font></li><li s=
tyle=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D=
"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPA=
RK-9858" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.=
6" target=3D"_blank">SPARK-9858=C2=A0</a>=C2=A0<span style=3D"line-height:1=
.6">Adaptive query execution</span>=C2=A0- Intial support for automatically=
 selecting the number of reducers for joins and aggregations.</font></li><l=
i style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font face=
=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/=
SPARK-9241" style=3D"color:rgb(71,135,242);text-decoration:none;line-height=
:1.6" target=3D"_blank">SPARK-9241=C2=A0</a>=C2=A0<span style=3D"line-heigh=
t:1.6">Improved query planner for queries having distinct aggregations</spa=
n>=C2=A0- Query plans of distinct aggregations are more robust when distinc=
t columns have high cardinality.</font></li></ul><h3 style=3D"font-size:12.=
8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,51)"><font face=3D=
"monospace, monospace" size=3D"2" style=3D"font-weight:normal">Spark Stream=
ing</font></h3><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10p=
x 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15p=
x;line-height:1.6"><span style=3D"line-height:1.6"><font face=3D"monospace,=
 monospace">API Updates</font></span><ul style=3D"padding:0px;margin:0px 0p=
x 0px 25px;line-height:1.6"><li style=3D"margin-left:15px;line-height:1.6">=
<font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/ji=
ra/browse/SPARK-2629" style=3D"color:rgb(71,135,242);text-decoration:none;l=
ine-height:1.6" target=3D"_blank">SPARK-2629=C2=A0</a>=C2=A0<span style=3D"=
line-height:1.6">New improved state management</span>=C2=A0-=C2=A0<code sty=
le=3D"padding:0px;border-radius:0px;white-space:nowrap;border:none;margin:0=
px;line-height:1.6;background:none">mapWithState</code>=C2=A0- a DStream tr=
ansformation for stateful stream processing, supercedes=C2=A0<code style=3D=
"padding:0px;border-radius:0px;white-space:nowrap;border:none;margin:0px;li=
ne-height:1.6;background:none">updateStateByKey</code>=C2=A0in functionalit=
y and performance.</font></li><li style=3D"margin-left:15px;line-height:1.6=
"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/=
jira/browse/SPARK-11198" style=3D"color:rgb(71,135,242);text-decoration:non=
e;line-height:1.6" target=3D"_blank">SPARK-11198</a>=C2=A0<span style=3D"li=
ne-height:1.6">Kinesis record deaggregation</span>=C2=A0- Kinesis streams h=
ave been upgraded to use KCL 1.4.0 and supports transparent deaggregation o=
f KPL-aggregated records.</font></li><li style=3D"margin-left:15px;line-hei=
ght:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apac=
he.org/jira/browse/SPARK-10891" style=3D"color:rgb(71,135,242);text-decorat=
ion:none;line-height:1.6" target=3D"_blank">SPARK-10891</a>=C2=A0<span styl=
e=3D"line-height:1.6">Kinesis message handler function</span>=C2=A0- Allows=
 arbitraray function to be applied to a Kinesis record in the Kinesis recei=
ver before to customize what data is to be stored in memory.</font></li><li=
 style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monosp=
ace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-6328" style=3D"=
color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blan=
k">SPARK-6328=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Python Streamn=
g Listener API</span>=C2=A0- Get streaming statistics (scheduling delays, b=
atch processing times, etc.) in streaming.</font></li></ul></li></ul><ul st=
yle=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px;line-height:25=
.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px;line-height:1.6"><s=
pan style=3D"line-height:1.6"><font face=3D"monospace, monospace">UI Improv=
ements</font></span><ul style=3D"padding:0px;margin:0px 0px 0px 25px;line-h=
eight:1.6"><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"mon=
ospace, monospace">Made failures visible in the streaming tab, in the timel=
ines, batch list, and batch details page.</font></li><li style=3D"margin-le=
ft:15px;line-height:1.6"><font face=3D"monospace, monospace">Made output op=
erations visible in the streaming tab as progress bars.</font></li></ul></l=
i></ul><h3 style=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.43;c=
olor:rgb(51,51,51)"><font face=3D"monospace, monospace" size=3D"2" style=3D=
"font-weight:normal">MLlib</font></h3><h4 style=3D"font-size:12.8px;margin:=
16px 0px 0px;line-height:1.6;color:rgb(51,51,51)"><font face=3D"monospace, =
monospace" style=3D"font-weight:normal">New algorithms/models</font></h4><u=
l style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px;line-heigh=
t:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px;line-height:1.6=
"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/=
jira/browse/SPARK-8518" style=3D"color:rgb(71,135,242);text-decoration:none=
;line-height:1.6" target=3D"_blank">SPARK-8518=C2=A0</a>=C2=A0<span style=
=3D"line-height:1.6">Survival analysis</span>=C2=A0- Log-linear model for s=
urvival analysis</font></li><li style=3D"margin-left:15px;line-height:1.6">=
<font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/ji=
ra/browse/SPARK-9834" style=3D"color:rgb(71,135,242);text-decoration:none;l=
ine-height:1.6" target=3D"_blank">SPARK-9834=C2=A0</a>=C2=A0<span style=3D"=
line-height:1.6">Normal equation for least squares</span>=C2=A0- Normal equ=
ation solver, providing R-like model summary statistics</font></li><li styl=
e=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace">=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-3147" style=3D"color=
:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SP=
ARK-3147=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Online hypothesis t=
esting</span>=C2=A0- A/B testing in the Spark Streaming framework</font></l=
i><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, m=
onospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-9930" styl=
e=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"=
_blank">SPARK-9930=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">New featu=
re transformers</span>=C2=A0- ChiSqSelector, QuantileDiscretizer, SQL trans=
former</font></li><li style=3D"margin-left:15px;line-height:1.6"><font face=
=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/=
SPARK-6517" style=3D"color:rgb(71,135,242);text-decoration:none;line-height=
:1.6" target=3D"_blank">SPARK-6517=C2=A0</a>=C2=A0<span style=3D"line-heigh=
t:1.6">Bisecting K-Means clustering</span>=C2=A0- Fast top-down clustering =
variant of K-Means</font></li></ul><h4 style=3D"font-size:12.8px;margin:16p=
x 0px 0px;line-height:1.6;color:rgb(51,51,51)"><font face=3D"monospace, mon=
ospace" style=3D"font-weight:normal">API improvements</font></h4><ul style=
=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px;line-height:25.6p=
x;color:rgb(51,51,51)"><li style=3D"margin-left:15px;line-height:1.6"><span=
 style=3D"line-height:1.6"><font face=3D"monospace, monospace">ML Pipelines=
</font></span><ul style=3D"padding:0px;margin:0px 0px 0px 25px;line-height:=
1.6"><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace=
, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-6725" s=
tyle=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=
=3D"_blank">SPARK-6725=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Pipel=
ine persistence</span>=C2=A0- Save/load for ML Pipelines, with partial cove=
rage of=C2=A0<a href=3D"http://spark.ml/" target=3D"_blank">spark.ml</a>alg=
orithms</font></li><li style=3D"margin-left:15px;line-height:1.6"><font fac=
e=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse=
/SPARK-5565" style=3D"color:rgb(71,135,242);text-decoration:none;line-heigh=
t:1.6" target=3D"_blank">SPARK-5565=C2=A0</a>=C2=A0<span style=3D"line-heig=
ht:1.6">LDA in ML Pipelines</span>=C2=A0- API for Latent Dirichlet Allocati=
on in ML Pipelines</font></li></ul></li><li style=3D"margin-left:15px;line-=
height:1.6"><span style=3D"line-height:1.6"><font face=3D"monospace, monosp=
ace">R API</font></span><ul style=3D"padding:0px;margin:0px 0px 0px 25px;li=
ne-height:1.6"><li style=3D"margin-left:15px;line-height:1.6"><font face=3D=
"monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPA=
RK-9836" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.=
6" target=3D"_blank">SPARK-9836=C2=A0</a>=C2=A0<span style=3D"line-height:1=
.6">R-like statistics for GLMs</span>=C2=A0- (Partial) R-like stats for ord=
inary least squares via summary(model)</font></li><li style=3D"margin-left:=
15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https:=
//issues.apache.org/jira/browse/SPARK-9681" style=3D"color:rgb(71,135,242);=
text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-9681=C2=A0</a=
>=C2=A0<span style=3D"line-height:1.6">Feature interactions in R formula</s=
pan>=C2=A0- Interaction operator &quot;:&quot; in R formula</font></li></ul=
></li><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospac=
e, monospace"><span style=3D"line-height:1.6">Python API</span>=C2=A0- Many=
 improvements to Python API to approach feature parity</font></li></ul><h4 =
style=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.6;color:rgb(51,=
51,51)"><font face=3D"monospace, monospace" style=3D"font-weight:normal">Mi=
sc improvements</font></h4><ul style=3D"font-size:12.8px;padding:0px;margin=
:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"mar=
gin-left:15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=
=3D"https://issues.apache.org/jira/browse/SPARK-7685" style=3D"color:rgb(71=
,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-768=
5=C2=A0</a>,=C2=A0<a href=3D"https://issues.apache.org/jira/browse/SPARK-96=
42" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" ta=
rget=3D"_blank">SPARK-9642=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">I=
nstance weights for GLMs</span>=C2=A0- Logistic and Linear Regression can t=
ake instance weights</font></li><li style=3D"margin-left:15px;line-height:1=
.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.or=
g/jira/browse/SPARK-10384" style=3D"color:rgb(71,135,242);text-decoration:n=
one;line-height:1.6" target=3D"_blank">SPARK-10384</a>,=C2=A0<a href=3D"htt=
ps://issues.apache.org/jira/browse/SPARK-10385" style=3D"color:rgb(71,135,2=
42);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10385</a>=
=C2=A0<span style=3D"line-height:1.6">Univariate and bivariate statistics i=
n DataFrames</span>=C2=A0- Variance, stddev, correlations, etc.</font></li>=
<li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, mon=
ospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-10117" style=
=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_=
blank">SPARK-10117</a>=C2=A0<span style=3D"line-height:1.6">LIBSVM data sou=
rce</span>=C2=A0- LIBSVM as a SQL data source</font><h4 style=3D"margin:16p=
x 0px 0px;line-height:1.6;color:inherit"><font face=3D"monospace, monospace=
" style=3D"font-weight:normal">Documentation improvements</font></h4></li><=
li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, mono=
space"><a href=3D"https://issues.apache.org/jira/browse/SPARK-7751" style=
=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_=
blank">SPARK-7751=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">@since ver=
sions</span>=C2=A0- Documentation includes initial version when classes and=
 methods were added</font></li><li style=3D"margin-left:15px;line-height:1.=
6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org=
/jira/browse/SPARK-11337" style=3D"color:rgb(71,135,242);text-decoration:no=
ne;line-height:1.6" target=3D"_blank">SPARK-11337</a>=C2=A0<span style=3D"l=
ine-height:1.6">Testable example code</span>=C2=A0- Automated testing for c=
ode in user guide examples</font></li></ul><h2 style=3D"font-size:12.8px;ma=
rgin:16px 0px 0px;line-height:1.225;color:rgb(51,51,51)"><font face=3D"mono=
space, monospace" size=3D"2" style=3D"font-weight:normal">Deprecations</fon=
t></h2><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25px;l=
ine-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px;line-h=
eight:1.6"><font face=3D"monospace, monospace">In spark.mllib.clustering.KM=
eans, the &quot;runs&quot; parameter has been deprecated.</font></li><li st=
yle=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace=
">In spark.ml.classification.LogisticRegressionModel and spark.ml.regressio=
n.LinearRegressionModel, the &quot;weights&quot; field has been deprecated,=
 in favor of the new name &quot;coefficients.&quot; This helps disambiguate=
 from instance (row) weights given to algorithms.</font></li></ul><h2 style=
=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.225;color:rgb(51,51,=
51)"><font face=3D"monospace, monospace" size=3D"2" style=3D"font-weight:no=
rmal">Changes of behavior</font></h2><ul style=3D"font-size:12.8px;padding:=
0px;margin:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li st=
yle=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace=
">spark.mllib.tree.GradientBoostedTrees validationTol has changed semantics=
 in 1.6. Previously, it was a threshold for absolute change in error. Now, =
it resembles the behavior of GradientDescent convergenceTol: For large erro=
rs, it uses relative error (relative to the previous error); for small erro=
rs (&lt; 0.01), it uses absolute error.</font></li><li style=3D"margin-left=
:15px;line-height:1.6"><font face=3D"monospace, monospace">spark.ml.feature=
.RegexTokenizer: Previously, it did not convert strings to lowercase before=
 tokenizing. Now, it converts to lowercase by default, with an option not t=
o. This matches the behavior of the simpler Tokenizer transformer.</font></=
li><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, =
monospace">Spark SQL&#39;s partition discovery has been changed to only dis=
cover partition directories that are children of the given path. (i.e. if=
=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;borde=
r:none;margin:0px;line-height:1.6;background:none">path=3D&quot;/my/data/x=
=3D1&quot;</code>=C2=A0then=C2=A0<code style=3D"padding:0px;border-radius:0=
px;white-space:nowrap;border:none;margin:0px;line-height:1.6;background:non=
e">x=3D1</code>=C2=A0will no longer be considered a partition but only chil=
dren of=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowra=
p;border:none;margin:0px;line-height:1.6;background:none">x=3D1</code>.) Th=
is behavior can be overridden by manually specifying the=C2=A0<code style=
=3D"padding:0px;border-radius:0px;white-space:nowrap;border:none;margin:0px=
;line-height:1.6;background:none">basePath</code>=C2=A0that partitioning di=
scovery should start with (<a href=3D"https://issues.apache.org/jira/browse=
/SPARK-11678" style=3D"color:rgb(71,135,242);text-decoration:none;line-heig=
ht:1.6" target=3D"_blank">SPARK-11678</a>).</font></li><li style=3D"margin-=
left:15px;line-height:1.6"><font face=3D"monospace, monospace">When casting=
 a value of an integral type to timestamp (e.g. casting a long value to tim=
estamp), the value is treated as being in seconds instead of milliseconds (=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-11724" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-11724</a>).</font></li><li style=3D"margin-left:15px;line-height:1.6">=
<font face=3D"monospace, monospace">With the improved query planner for que=
ries having distinct aggregations (<a href=3D"https://issues.apache.org/jir=
a/browse/SPARK-9241" style=3D"color:rgb(71,135,242);text-decoration:none;li=
ne-height:1.6" target=3D"_blank">SPARK-9241</a>), the plan of a query havin=
g a single distinct aggregation has been changed to a more robust version. =
To switch back to the plan generated by Spark 1.5&#39;s planner, please set=
=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;borde=
r:none;margin:0px;line-height:1.6;background-image:none;background-color:in=
itial;background-repeat:initial">spark.sql.specializeSingleDistinctAggPlann=
ing</code>=C2=A0to=C2=A0<code style=3D"padding:0px;border-radius:0px;white-=
space:nowrap;border:none;margin:0px;line-height:1.6;background-image:none;b=
ackground-color:initial;background-repeat:initial">true</code>=C2=A0(<a hre=
f=3D"https://issues.apache.org/jira/browse/SPARK-12077" style=3D"color:rgb(=
71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-1=
2077</a>).</font></li></ul></div></div>
</blockquote></div><br></div>

--001a1142e9c26b047505278514e1--