Mailing-List: contact dev-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CAADy7x4r+dJOvtoT9Ef=UypmF0CJ8Pv+BFLPxS2ZUcFLWnAqoA@mail.gmail.com>
References: 
 <CAAswR-4CMbOZRxvzWk3JZb4y2xHxd_01NcrpVfD9tkX1nC8zZQ@mail.gmail.com>
 <CAGU5spch53RM5+5M_h8LS8N4Ov-QdWcLW4++5Fh53GBBwrReqQ@mail.gmail.com>
 <CAADy7x4r+dJOvtoT9Ef=UypmF0CJ8Pv+BFLPxS2ZUcFLWnAqoA@mail.gmail.com>
From: Michael Armbrust <michael@databricks.com>
Date: Mon, 21 Dec 2015 17:48:37 -0800
Message-ID: 
 <CAAswR-5vDCjkeCSzc0YQjS84eOJmrhB+=Z3xxvWB2haO7w4nRw@mail.gmail.com>
Subject: Re: [VOTE] Release Apache Spark 1.6.0 (RC3)
To: "dev@spark.apache.org" <dev@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a1140ef920f15fe052772cf50

--001a1140ef920f15fe052772cf50
Content-Type: text/plain; charset=UTF-8

It's come to my attention that there have been several bug fixes merged
since RC3:

  - SPARK-12404 - Fix serialization error for Datasets with
Timestamps/Arrays/Decimal
  - SPARK-12218 - Fix incorrect pushdown of filters to parquet
  - SPARK-12395 - Fix join columns of outer join for DataFrame using
  - SPARK-12413 - Fix mesos HA

Normally, these would probably not be sufficient to hold the release,
however with the holidays going on in the US this week, we don't have the
resources to finalize 1.6 until next Monday.  Given this delay anyway, I
propose that we cut one final RC with the above fixes and plan for the
actual release first thing next week.

I'll post RC4 shortly and cancel this vote if there are no objections.
Since this vote nearly passed with no major issues, I don't anticipate any
problems with RC4.

Michael

On Sat, Dec 19, 2015 at 11:44 PM, Jeff Zhang <zjffdu@gmail.com> wrote:

> +1 (non-binding)
>
> All the test passed, and run it on HDP 2.3.2 sandbox successfully.
>
> On Sun, Dec 20, 2015 at 10:43 AM, Luciano Resende <luckbr1975@gmail.com>
> wrote:
>
>> +1 (non-binding)
>>
>> Tested Standalone mode, SparkR and couple Stream Apps, all seem ok.
>>
>> On Wed, Dec 16, 2015 at 1:32 PM, Michael Armbrust <michael@databricks.com
>> > wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 1.6.0!
>>>
>>> The vote is open until Saturday, December 19, 2015 at 18:00 UTC and
>>> passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 1.6.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is *v1.6.0-rc3
>>> (168c89e07c51fa24b0bb88582c739cec0acb44d7)
>>> <https://github.com/apache/spark/tree/v1.6.0-rc3>*
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1174/
>>>
>>> The test repository (versioned as v1.6.0-rc3) for this release can be
>>> found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1173/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/
>>>
>>> =======================================
>>> == How can I help test this release? ==
>>> =======================================
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> ================================================
>>> == What justifies a -1 vote for this release? ==
>>> ================================================
>>> This vote is happening towards the end of the 1.6 QA period, so -1 votes
>>> should only occur for significant regressions from 1.5. Bugs already
>>> present in 1.5, minor regressions, or bugs related to new features will not
>>> block this release.
>>>
>>> ===============================================================
>>> == What should happen to JIRA tickets still targeting 1.6.0? ==
>>> ===============================================================
>>> 1. It is OK for documentation patches to target 1.6.0 and still go into
>>> branch-1.6, since documentations will be published separately from the
>>> release.
>>> 2. New features for non-alpha-modules should target 1.7+.
>>> 3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the
>>> target version.
>>>
>>>
>>> ==================================================
>>> == Major changes to help you focus your testing ==
>>> ==================================================
>>>
>>> Notable changes since 1.6 RC2
>>> - SPARK_VERSION has been set correctly
>>> - SPARK-12199 ML Docs are publishing correctly
>>> - SPARK-12345 Mesos cluster mode has been fixed
>>>
>>> Notable changes since 1.6 RC1
>>> Spark Streaming
>>>
>>>    - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629>
>>>    trackStateByKey has been renamed to mapWithState
>>>
>>> Spark SQL
>>>
>>>    - SPARK-12165 <https://issues.apache.org/jira/browse/SPARK-12165>
>>>    SPARK-12189 <https://issues.apache.org/jira/browse/SPARK-12189> Fix
>>>    bugs in eviction of storage memory by execution.
>>>    - SPARK-12258 <https://issues.apache.org/jira/browse/SPARK-12258> correct
>>>    passing null into ScalaUDF
>>>
>>> Notable Features Since 1.5Spark SQL
>>>
>>>    - SPARK-11787 <https://issues.apache.org/jira/browse/SPARK-11787> Parquet
>>>    Performance - Improve Parquet scan performance when using flat
>>>    schemas.
>>>    - SPARK-10810 <https://issues.apache.org/jira/browse/SPARK-10810>
>>>    Session Management - Isolated devault database (i.e USE mydb) even
>>>    on shared clusters.
>>>    - SPARK-9999  <https://issues.apache.org/jira/browse/SPARK-9999> Dataset
>>>    API - A type-safe API (similar to RDDs) that performs many
>>>    operations on serialized binary data and code generation (i.e. Project
>>>    Tungsten).
>>>    - SPARK-10000 <https://issues.apache.org/jira/browse/SPARK-10000> Unified
>>>    Memory Management - Shared memory for execution and caching instead
>>>    of exclusive division of the regions.
>>>    - SPARK-11197 <https://issues.apache.org/jira/browse/SPARK-11197> SQL
>>>    Queries on Files - Concise syntax for running SQL queries over files
>>>    of any supported format without registering a table.
>>>    - SPARK-11745 <https://issues.apache.org/jira/browse/SPARK-11745> Reading
>>>    non-standard JSON files - Added options to read non-standard JSON
>>>    files (e.g. single-quotes, unquoted attributes)
>>>    - SPARK-10412 <https://issues.apache.org/jira/browse/SPARK-10412> Per-operator
>>>    Metrics for SQL Execution - Display statistics on a peroperator
>>>    basis for memory usage and spilled data size.
>>>    - SPARK-11329 <https://issues.apache.org/jira/browse/SPARK-11329> Star
>>>    (*) expansion for StructTypes - Makes it easier to nest and unest
>>>    arbitrary numbers of columns
>>>    - SPARK-10917 <https://issues.apache.org/jira/browse/SPARK-10917>,
>>>    SPARK-11149 <https://issues.apache.org/jira/browse/SPARK-11149> In-memory
>>>    Columnar Cache Performance - Significant (up to 14x) speed up when
>>>    caching data that contains complex types in DataFrames or SQL.
>>>    - SPARK-11111 <https://issues.apache.org/jira/browse/SPARK-11111> Fast
>>>    null-safe joins - Joins using null-safe equality (<=>) will now
>>>    execute using SortMergeJoin instead of computing a cartisian product.
>>>    - SPARK-11389 <https://issues.apache.org/jira/browse/SPARK-11389> SQL
>>>    Execution Using Off-Heap Memory - Support for configuring query
>>>    execution to occur using off-heap memory to avoid GC overhead
>>>    - SPARK-10978 <https://issues.apache.org/jira/browse/SPARK-10978> Datasource
>>>    API Avoid Double Filter - When implemeting a datasource with filter
>>>    pushdown, developers can now tell Spark SQL to avoid double evaluating a
>>>    pushed-down filter.
>>>    - SPARK-4849  <https://issues.apache.org/jira/browse/SPARK-4849> Advanced
>>>    Layout of Cached Data - storing partitioning and ordering schemes in
>>>    In-memory table scan, and adding distributeBy and localSort to DF API
>>>    - SPARK-9858  <https://issues.apache.org/jira/browse/SPARK-9858> Adaptive
>>>    query execution - Intial support for automatically selecting the
>>>    number of reducers for joins and aggregations.
>>>    - SPARK-9241  <https://issues.apache.org/jira/browse/SPARK-9241> Improved
>>>    query planner for queries having distinct aggregations - Query plans
>>>    of distinct aggregations are more robust when distinct columns have high
>>>    cardinality.
>>>
>>> Spark Streaming
>>>
>>>    - API Updates
>>>       - SPARK-2629  <https://issues.apache.org/jira/browse/SPARK-2629> New
>>>       improved state management - mapWithState - a DStream
>>>       transformation for stateful stream processing, supercedes
>>>       updateStateByKey in functionality and performance.
>>>       - SPARK-11198 <https://issues.apache.org/jira/browse/SPARK-11198> Kinesis
>>>       record deaggregation - Kinesis streams have been upgraded to use
>>>       KCL 1.4.0 and supports transparent deaggregation of KPL-aggregated records.
>>>       - SPARK-10891 <https://issues.apache.org/jira/browse/SPARK-10891> Kinesis
>>>       message handler function - Allows arbitraray function to be
>>>       applied to a Kinesis record in the Kinesis receiver before to customize
>>>       what data is to be stored in memory.
>>>       - SPARK-6328  <https://issues.apache.org/jira/browse/SPARK-6328> Python
>>>       Streamng Listener API - Get streaming statistics (scheduling
>>>       delays, batch processing times, etc.) in streaming.
>>>
>>>
>>>    - UI Improvements
>>>       - Made failures visible in the streaming tab, in the timelines,
>>>       batch list, and batch details page.
>>>       - Made output operations visible in the streaming tab as progress
>>>       bars.
>>>
>>> MLlibNew algorithms/models
>>>
>>>    - SPARK-8518  <https://issues.apache.org/jira/browse/SPARK-8518> Survival
>>>    analysis - Log-linear model for survival analysis
>>>    - SPARK-9834  <https://issues.apache.org/jira/browse/SPARK-9834> Normal
>>>    equation for least squares - Normal equation solver, providing
>>>    R-like model summary statistics
>>>    - SPARK-3147  <https://issues.apache.org/jira/browse/SPARK-3147> Online
>>>    hypothesis testing - A/B testing in the Spark Streaming framework
>>>    - SPARK-9930  <https://issues.apache.org/jira/browse/SPARK-9930> New
>>>    feature transformers - ChiSqSelector, QuantileDiscretizer, SQL
>>>    transformer
>>>    - SPARK-6517  <https://issues.apache.org/jira/browse/SPARK-6517> Bisecting
>>>    K-Means clustering - Fast top-down clustering variant of K-Means
>>>
>>> API improvements
>>>
>>>    - ML Pipelines
>>>       - SPARK-6725  <https://issues.apache.org/jira/browse/SPARK-6725> Pipeline
>>>       persistence - Save/load for ML Pipelines, with partial coverage
>>>       of spark.mlalgorithms
>>>       - SPARK-5565  <https://issues.apache.org/jira/browse/SPARK-5565> LDA
>>>       in ML Pipelines - API for Latent Dirichlet Allocation in ML
>>>       Pipelines
>>>    - R API
>>>       - SPARK-9836  <https://issues.apache.org/jira/browse/SPARK-9836> R-like
>>>       statistics for GLMs - (Partial) R-like stats for ordinary least
>>>       squares via summary(model)
>>>       - SPARK-9681  <https://issues.apache.org/jira/browse/SPARK-9681> Feature
>>>       interactions in R formula - Interaction operator ":" in R formula
>>>    - Python API - Many improvements to Python API to approach feature
>>>    parity
>>>
>>> Misc improvements
>>>
>>>    - SPARK-7685  <https://issues.apache.org/jira/browse/SPARK-7685>,
>>>    SPARK-9642  <https://issues.apache.org/jira/browse/SPARK-9642> Instance
>>>    weights for GLMs - Logistic and Linear Regression can take instance
>>>    weights
>>>    - SPARK-10384 <https://issues.apache.org/jira/browse/SPARK-10384>,
>>>    SPARK-10385 <https://issues.apache.org/jira/browse/SPARK-10385> Univariate
>>>    and bivariate statistics in DataFrames - Variance, stddev,
>>>    correlations, etc.
>>>    - SPARK-10117 <https://issues.apache.org/jira/browse/SPARK-10117> LIBSVM
>>>    data source - LIBSVM as a SQL data sourceDocumentation improvements
>>>    - SPARK-7751  <https://issues.apache.org/jira/browse/SPARK-7751> @since
>>>    versions - Documentation includes initial version when classes and
>>>    methods were added
>>>    - SPARK-11337 <https://issues.apache.org/jira/browse/SPARK-11337> Testable
>>>    example code - Automated testing for code in user guide examples
>>>
>>> Deprecations
>>>
>>>    - In spark.mllib.clustering.KMeans, the "runs" parameter has been
>>>    deprecated.
>>>    - In spark.ml.classification.LogisticRegressionModel and
>>>    spark.ml.regression.LinearRegressionModel, the "weights" field has been
>>>    deprecated, in favor of the new name "coefficients." This helps
>>>    disambiguate from instance (row) weights given to algorithms.
>>>
>>> Changes of behavior
>>>
>>>    - spark.mllib.tree.GradientBoostedTrees validationTol has changed
>>>    semantics in 1.6. Previously, it was a threshold for absolute change in
>>>    error. Now, it resembles the behavior of GradientDescent convergenceTol:
>>>    For large errors, it uses relative error (relative to the previous error);
>>>    for small errors (< 0.01), it uses absolute error.
>>>    - spark.ml.feature.RegexTokenizer: Previously, it did not convert
>>>    strings to lowercase before tokenizing. Now, it converts to lowercase by
>>>    default, with an option not to. This matches the behavior of the simpler
>>>    Tokenizer transformer.
>>>    - Spark SQL's partition discovery has been changed to only discover
>>>    partition directories that are children of the given path. (i.e. if
>>>    path="/my/data/x=1" then x=1 will no longer be considered a
>>>    partition but only children of x=1.) This behavior can be overridden
>>>    by manually specifying the basePath that partitioning discovery
>>>    should start with (SPARK-11678
>>>    <https://issues.apache.org/jira/browse/SPARK-11678>).
>>>    - When casting a value of an integral type to timestamp (e.g.
>>>    casting a long value to timestamp), the value is treated as being in
>>>    seconds instead of milliseconds (SPARK-11724
>>>    <https://issues.apache.org/jira/browse/SPARK-11724>).
>>>    - With the improved query planner for queries having distinct
>>>    aggregations (SPARK-9241
>>>    <https://issues.apache.org/jira/browse/SPARK-9241>), the plan of a
>>>    query having a single distinct aggregation has been changed to a more
>>>    robust version. To switch back to the plan generated by Spark 1.5's
>>>    planner, please set spark.sql.specializeSingleDistinctAggPlanning to
>>>    true (SPARK-12077 <https://issues.apache.org/jira/browse/SPARK-12077>
>>>    ).
>>>
>>>
>>
>>
>> --
>> Luciano Resende
>> http://people.apache.org/~lresende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>

--001a1140ef920f15fe052772cf50
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>It&#39;s come to my attention that there have been se=
veral bug fixes merged since RC3:</div><div><br></div><div>=C2=A0 - SPARK-1=
2404 - Fix serialization error for Datasets with Timestamps/Arrays/Decimal<=
br>=C2=A0 - SPARK-12218 - Fix incorrect pushdown of filters to parquet<br>=
=C2=A0 - SPARK-12395 - Fix join columns of outer join for DataFrame using<b=
r>=C2=A0 - SPARK-12413 - Fix mesos HA</div><div><br></div><div>Normally, th=
ese would probably not be sufficient to hold the release, however with the =
holidays going on in the US this week, we don&#39;t have the resources to f=
inalize 1.6 until next Monday.=C2=A0 Given this delay anyway, I propose tha=
t we cut one final RC with the above fixes and plan for the actual release =
first thing next week.</div><div><br></div><div>I&#39;ll post RC4 shortly a=
nd cancel this vote if there are no objections.=C2=A0 Since this vote nearl=
y passed with no major issues, I don&#39;t anticipate any problems with RC4=
.</div><div><br></div><div>Michael<br><div><div class=3D"gmail_extra"><br><=
div class=3D"gmail_quote">On Sat, Dec 19, 2015 at 11:44 PM, Jeff Zhang <spa=
n dir=3D"ltr">&lt;<a href=3D"mailto:zjffdu@gmail.com" target=3D"_blank">zjf=
fdu@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" st=
yle=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb=
(204,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr">+1=
 (non-binding)<div><br></div><div>All the test passed, and run it on HDP 2.=
3.2 sandbox successfully.</div></div><div class=3D"gmail_extra"><div><div><=
br><div class=3D"gmail_quote">On Sun, Dec 20, 2015 at 10:43 AM, Luciano Res=
ende <span dir=3D"ltr">&lt;<a href=3D"mailto:luckbr1975@gmail.com" target=
=3D"_blank">luckbr1975@gmail.com</a>&gt;</span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;bo=
rder-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">=
<div dir=3D"ltr"><div>+1 (non-binding)<br></div><div><br></div>Tested Stand=
alone mode, SparkR and couple Stream Apps, all seem ok.<br></div><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote"><span>On Wed, Dec 16, 2015 =
at 1:32 PM, Michael Armbrust <span dir=3D"ltr">&lt;<a href=3D"mailto:michae=
l@databricks.com" target=3D"_blank">michael@databricks.com</a>&gt;</span> w=
rote:<br></span><div><div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204)=
;border-left-style:solid;padding-left:1ex"><div dir=3D"ltr"><div style=3D"f=
ont-size:12.8px"><font face=3D"monospace, monospace">Please vote on releasi=
ng the following candidate as Apache Spark version 1.6.0!</font></div><div =
style=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br>The vote=
 is open until Saturday, December 19, 2015=C2=A0at 18<span style=3D"font-si=
ze:12.8px">:00 UTC and passes if a majority=C2=A0of at least 3 +1 PMC votes=
 are cast.</span><br></font></div><div style=3D"font-size:12.8px"><font fac=
e=3D"monospace, monospace"><br></font></div><div style=3D"font-size:12.8px"=
><font face=3D"monospace, monospace">[ ] +1 Release this package as Apache =
Spark 1.6.0</font></div><div style=3D"font-size:12.8px"><font face=3D"monos=
pace, monospace">[ ] -1 Do not release this package because ...</font></div=
><div style=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></=
font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospa=
ce">To learn more about Apache Spark, please see=C2=A0<a href=3D"http://spa=
rk.apache.org/" target=3D"_blank">http://spark.apache.org/</a></font></div>=
<div style=3D"font-size:12.8px"><font face=3D"monospace, monospace"><br></f=
ont></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospac=
e">The tag to be voted on is=C2=A0<u style=3D"font-size:12.8px;color:rgb(17=
,85,204)"><a href=3D"https://github.com/apache/spark/tree/v1.6.0-rc3" targe=
t=3D"_blank">v1.6.0-rc3 (168c89e07c51fa24b0bb88582c739cec0acb44d7)</a></u><=
/font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monosp=
ace"><br></font></div><div style=3D"font-size:12.8px"><font face=3D"monospa=
ce, monospace">The release files, including signatures, digests, etc. can b=
e found at:</font></div><div style=3D"font-size:12.8px"><a href=3D"http://p=
eople.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-bin/" target=3D"_=
blank"><font face=3D"monospace, monospace">http://people.apache.org/~pwende=
ll/spark-releases/spark-1.6.0-rc3-bin/</font></a></div><div style=3D"font-s=
ize:12.8px"><font face=3D"monospace, monospace"><br></font></div><div style=
=3D"font-size:12.8px"><font face=3D"monospace, monospace">Release artifacts=
 are signed with the following key:</font></div><div style=3D"font-size:12.=
8px"><a href=3D"https://people.apache.org/keys/committer/pwendell.asc" targ=
et=3D"_blank"><font face=3D"monospace, monospace">https://people.apache.org=
/keys/committer/pwendell.asc</font></a></div><div style=3D"font-size:12.8px=
"><font face=3D"monospace, monospace"><br></font></div><div style=3D"font-s=
ize:12.8px"><div style=3D"font-size:12.8px"><font face=3D"monospace, monosp=
ace">The staging repository for this release can be found at:</font></div><=
div style=3D"font-size:12.8px"><a href=3D"https://repository.apache.org/con=
tent/repositories/orgapachespark-1174/" target=3D"_blank"><font face=3D"mon=
ospace, monospace">https://repository.apache.org/content/repositories/orgap=
achespark-1174/</font></a></div></div><div style=3D"font-size:12.8px"><font=
 face=3D"monospace, monospace"><br></font></div><div style=3D"font-size:12.=
8px"><font face=3D"monospace, monospace">The test repository (versioned as =
v1.6.0-rc3) for this release can be found at:</font></div><div style=3D"fon=
t-size:12.8px"><a href=3D"https://repository.apache.org/content/repositorie=
s/orgapachespark-1173/" target=3D"_blank"><font face=3D"monospace, monospac=
e">https://repository.apache.org/content/repositories/orgapachespark-1173/<=
/font></a></div><div style=3D"font-size:12.8px"><font face=3D"monospace, mo=
nospace"><br></font></div><div style=3D"font-size:12.8px"><font face=3D"mon=
ospace, monospace">The documentation corresponding to this release can be f=
ound at:</font></div><div style=3D"font-size:12.8px"><a href=3D"http://peop=
le.apache.org/~pwendell/spark-releases/spark-1.6.0-rc3-docs/" target=3D"_bl=
ank"><font face=3D"monospace, monospace">http://people.apache.org/~pwendell=
/spark-releases/spark-1.6.0-rc3-docs/</font></a></div><div style=3D"font-si=
ze:12.8px"><br></div><div style=3D"font-size:12.8px"><font face=3D"monospac=
e, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div =
style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D How c=
an I help test this release? =3D=3D</font></div><div style=3D"font-size:12.=
8px"><font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D</font></div><div style=3D"font-size:12.8px"><font face=3D"monospa=
ce, monospace">If you are a Spark user, you can help us test this release b=
y taking an existing Spark workload and running on this release candidate, =
then reporting any regressions.</font></div><div style=3D"font-size:12.8px"=
><font face=3D"monospace, monospace"><br></font></div><div style=3D"font-si=
ze:12.8px"><font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-s=
ize:12.8px"><font face=3D"monospace, monospace">=3D=3D What justifies a -1 =
vote for this release? =3D=3D</font></div><div style=3D"font-size:12.8px"><=
font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size:12.8px">=
<font face=3D"monospace, monospace">This vote is happening towards the end =
of the 1.6 QA period, so -1 votes should only occur for significant regress=
ions from 1.5. Bugs already present in 1.5, minor regressions, or bugs rela=
ted to new features will not block this release.</font></div><div style=3D"=
font-size:12.8px"><font face=3D"monospace, monospace"><br></font></div><div=
 style=3D"font-size:12.8px"><font face=3D"monospace, monospace">=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size:12.8px">=
<font face=3D"monospace, monospace">=3D=3D What should happen to JIRA ticke=
ts still targeting 1.6.0? =3D=3D</font></div><div style=3D"font-size:12.8px=
"><font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D</font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, mo=
nospace">1. It is OK for documentation patches to target 1.6.0 and still go=
 into branch-1.6, since documentations will be published separately from th=
e release.</font></div><div style=3D"font-size:12.8px"><font face=3D"monosp=
ace, monospace">2. New features for non-alpha-modules should target 1.7+.</=
font></div><div style=3D"font-size:12.8px"><font face=3D"monospace, monospa=
ce">3. Non-blocker bug fixes should target 1.6.1 or 1.7.0, or drop the targ=
et version.</font></div><div style=3D"font-size:12.8px"><font face=3D"monos=
pace, monospace"><br></font></div><div style=3D"font-size:12.8px"><font fac=
e=3D"monospace, monospace"><br></font></div><div style=3D"font-size:12.8px"=
><font face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size=
:12.8px"><font face=3D"monospace, monospace">=3D=3D Major changes to help y=
ou focus your testing =3D=3D</font></div><div style=3D"font-size:12.8px"><f=
ont face=3D"monospace, monospace">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</font></div><div style=3D"font-size:12=
.8px"><br></div><div><h1 style=3D"font-size:12.8px;margin:0px;line-height:1=
.2;color:rgb(51,51,51)"><span style=3D"font-weight:normal;font-family:monos=
pace,monospace;font-size:small;line-height:1.225">Notable changes since 1.6=
 RC2</span></h1><font face=3D"monospace, monospace"><br>- SPARK_VERSION has=
 been set correctly<br>- SPARK-12199=C2=A0ML Docs are publishing correctly<=
br>- SPARK-12345 Mesos cluster mode has been fixed</font><div><span style=
=3D"font-size:12.8px;font-family:monospace,monospace"><br></span></div><h1 =
style=3D"font-size:12.8px;margin:0px;line-height:1.2;color:rgb(51,51,51)"><=
span style=3D"font-weight:normal;font-family:monospace,monospace;font-size:=
small;line-height:1.225">Notable changes since 1.6 RC1</span><br></h1><h3 s=
tyle=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,=
51,51)"><font style=3D"font-weight:normal" face=3D"monospace, monospace" si=
ze=3D"2">Spark Streaming</font></h3><ul style=3D"font-size:12.8px;padding:0=
px;margin:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li sty=
le=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace"=
><a href=3D"https://issues.apache.org/jira/browse/SPARK-2629" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-2629=C2=A0</a>=C2=A0<code style=3D"padding:0px;border-radius:0px;white=
-space:nowrap;border:none;margin:0px;line-height:1.6;background:none">track=
StateByKey</code>=C2=A0has been renamed to=C2=A0<code style=3D"padding:0px;=
border-radius:0px;white-space:nowrap;border:none;margin:0px;line-height:1.6=
;background:none">mapWithState</code></font></li></ul><h3 style=3D"font-siz=
e:12.8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,51)"><font st=
yle=3D"font-weight:normal" face=3D"monospace, monospace" size=3D"2">Spark S=
QL</font></h3><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px=
 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px=
;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://is=
sues.apache.org/jira/browse/SPARK-12165" style=3D"color:rgb(71,135,242);tex=
t-decoration:none;line-height:1.6" target=3D"_blank">SPARK-12165</a>=C2=A0<=
a href=3D"https://issues.apache.org/jira/browse/SPARK-12189" style=3D"color=
:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SP=
ARK-12189</a>=C2=A0Fix bugs in eviction of storage memory by execution.</fo=
nt></li><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monosp=
ace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-1225=
8" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" tar=
get=3D"_blank">SPARK-12258</a>=C2=A0correct passing null into ScalaUDF</fon=
t></li></ul><h2 style=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1=
.225;color:rgb(51,51,51)"><font style=3D"font-weight:normal" face=3D"monosp=
ace, monospace" size=3D"2">Notable Features Since 1.5</font></h2><h3 style=
=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,5=
1)"><font style=3D"font-weight:normal" face=3D"monospace, monospace" size=
=3D"2">Spark SQL</font></h3><ul style=3D"font-size:12.8px;padding:0px;margi=
n:0px 0px 10px 25px"><li style=3D"margin-left:15px;color:rgb(51,51,51);line=
-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.=
apache.org/jira/browse/SPARK-11787" style=3D"color:rgb(71,135,242);text-dec=
oration:none;line-height:1.6" target=3D"_blank">SPARK-11787</a>=C2=A0<span =
style=3D"line-height:1.6">Parquet Performance</span>=C2=A0- Improve Parquet=
 scan performance when using flat schemas.</font></li><li style=3D"margin-l=
eft:15px"><font face=3D"monospace, monospace"><a href=3D"https://issues.apa=
che.org/jira/browse/SPARK-10810" style=3D"color:rgb(71,135,242);line-height=
:1.6;text-decoration:none" target=3D"_blank">SPARK-10810</a><font color=3D"=
#333333"><span style=3D"line-height:1.6">=C2=A0</span></font><span style=3D=
"color:rgb(51,51,51);line-height:1.6">Session=C2=A0</span><font color=3D"#3=
33333"><span style=3D"line-height:20.8px">Management</span><span style=3D"l=
ine-height:1.6">=C2=A0- Isolated devault database (i.e=C2=A0</span></font><=
code style=3D"color:rgb(51,51,51);line-height:1.6;padding:0px;border-radius=
:0px;white-space:nowrap;border:none;margin:0px;background-image:none;backgr=
ound-color:initial;background-repeat:initial">USE mydb</code><font color=3D=
"#333333"><span style=3D"line-height:1.6">) even on shared clusters.</span>=
</font></font></li><li style=3D"margin-left:15px;color:rgb(51,51,51);line-h=
eight:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.ap=
ache.org/jira/browse/SPARK-9999" style=3D"color:rgb(71,135,242);text-decora=
tion:none;line-height:1.6" target=3D"_blank">SPARK-9999=C2=A0</a>=C2=A0<spa=
n style=3D"line-height:1.6">Dataset API</span>=C2=A0- A type-safe API (simi=
lar to RDDs) that performs many operations on serialized binary data and co=
de generation (i.e. Project Tungsten).</font></li><li style=3D"margin-left:=
15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospac=
e"><a href=3D"https://issues.apache.org/jira/browse/SPARK-10000" style=3D"c=
olor:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank=
">SPARK-10000</a>=C2=A0<span style=3D"line-height:1.6">Unified Memory Manag=
ement</span>=C2=A0- Shared memory for execution and caching instead of excl=
usive division of the regions.</font></li><li style=3D"margin-left:15px;col=
or:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospace"><a hr=
ef=3D"https://issues.apache.org/jira/browse/SPARK-11197" style=3D"color:rgb=
(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-=
11197</a>=C2=A0<span style=3D"line-height:1.6">SQL Queries on Files</span>=
=C2=A0- Concise syntax for running SQL queries over files of any supported =
format without registering a table.</font></li><li style=3D"margin-left:15p=
x;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospace">=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-11745" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-11745</a>=C2=A0<span style=3D"line-height:1.6">Reading non-standard JS=
ON files</span>=C2=A0- Added options to read non-standard JSON files (e.g. =
single-quotes, unquoted attributes)</font></li><li style=3D"margin-left:15p=
x;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospace">=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-10412" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-10412</a>=C2=A0<span style=3D"line-height:1.6">Per-operator Metrics fo=
r SQL Execution</span>=C2=A0- Display statistics on a peroperator basis for=
 memory usage and spilled data size.</font></li><li style=3D"margin-left:15=
px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monospace"=
><a href=3D"https://issues.apache.org/jira/browse/SPARK-11329" style=3D"col=
or:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">=
SPARK-11329</a>=C2=A0<span style=3D"line-height:1.6">Star (*) expansion for=
 StructTypes</span>=C2=A0- Makes it easier to nest and unest arbitrary numb=
ers of columns</font></li><li style=3D"margin-left:15px;color:rgb(51,51,51)=
;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://is=
sues.apache.org/jira/browse/SPARK-10917" style=3D"color:rgb(71,135,242);tex=
t-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10917</a>,=C2=A0=
<a href=3D"https://issues.apache.org/jira/browse/SPARK-11149" style=3D"colo=
r:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">S=
PARK-11149</a>=C2=A0<span style=3D"line-height:1.6">In-memory Columnar Cach=
e Performance</span>=C2=A0- Significant (up to 14x) speed up when caching d=
ata that contains complex types in DataFrames or SQL.</font></li><li style=
=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"mon=
ospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-1=
1111" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" =
target=3D"_blank">SPARK-11111</a>=C2=A0<span style=3D"line-height:1.6">Fast=
 null-safe joins</span>=C2=A0- Joins using null-safe equality (<code style=
=3D"padding:0px;border-radius:0px;white-space:nowrap;border:none;margin:0px=
;line-height:1.6;background:none">&lt;=3D&gt;</code>) will now execute usin=
g SortMergeJoin instead of computing a cartisian product.</font></li><li st=
yle=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"=
monospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPAR=
K-11389" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.=
6" target=3D"_blank">SPARK-11389</a>=C2=A0<span style=3D"line-height:1.6">S=
QL Execution Using Off-Heap Memory</span>=C2=A0- Support for configuring qu=
ery execution to occur using off-heap memory to avoid GC overhead</font></l=
i><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><font =
face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/bro=
wse/SPARK-10978" style=3D"color:rgb(71,135,242);text-decoration:none;line-h=
eight:1.6" target=3D"_blank">SPARK-10978</a>=C2=A0<span style=3D"line-heigh=
t:1.6">Datasource API Avoid Double Filter</span>=C2=A0- When implemeting a =
datasource with filter pushdown, developers can now tell Spark SQL to avoid=
 double evaluating a pushed-down filter.</font></li><li style=3D"margin-lef=
t:15px;color:rgb(51,51,51);line-height:1.6"><font face=3D"monospace, monosp=
ace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-4849" style=3D"=
color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blan=
k">SPARK-4849=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Advanced Layou=
t of Cached Data</span>=C2=A0- storing partitioning and ordering schemes in=
 In-memory table scan, and adding distributeBy and localSort to DF API</fon=
t></li><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6"><=
font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jir=
a/browse/SPARK-9858" style=3D"color:rgb(71,135,242);text-decoration:none;li=
ne-height:1.6" target=3D"_blank">SPARK-9858=C2=A0</a>=C2=A0<span style=3D"l=
ine-height:1.6">Adaptive query execution</span>=C2=A0- Intial support for a=
utomatically selecting the number of reducers for joins and aggregations.</=
font></li><li style=3D"margin-left:15px;color:rgb(51,51,51);line-height:1.6=
"><font face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/=
jira/browse/SPARK-9241" style=3D"color:rgb(71,135,242);text-decoration:none=
;line-height:1.6" target=3D"_blank">SPARK-9241=C2=A0</a>=C2=A0<span style=
=3D"line-height:1.6">Improved query planner for queries having distinct agg=
regations</span>=C2=A0- Query plans of distinct aggregations are more robus=
t when distinct columns have high cardinality.</font></li></ul><h3 style=3D=
"font-size:12.8px;margin:16px 0px 0px;line-height:1.43;color:rgb(51,51,51)"=
><font style=3D"font-weight:normal" face=3D"monospace, monospace" size=3D"2=
">Spark Streaming</font></h3><ul style=3D"font-size:12.8px;padding:0px;marg=
in:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"m=
argin-left:15px;line-height:1.6"><span style=3D"line-height:1.6"><font face=
=3D"monospace, monospace">API Updates</font></span><ul style=3D"padding:0px=
;margin:0px 0px 0px 25px;line-height:1.6"><li style=3D"margin-left:15px;lin=
e-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues=
.apache.org/jira/browse/SPARK-2629" style=3D"color:rgb(71,135,242);text-dec=
oration:none;line-height:1.6" target=3D"_blank">SPARK-2629=C2=A0</a>=C2=A0<=
span style=3D"line-height:1.6">New improved state management</span>=C2=A0-=
=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;borde=
r:none;margin:0px;line-height:1.6;background:none">mapWithState</code>=C2=
=A0- a DStream transformation for stateful stream processing, supercedes=C2=
=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;border:n=
one;margin:0px;line-height:1.6;background:none">updateStateByKey</code>=C2=
=A0in functionality and performance.</font></li><li style=3D"margin-left:15=
px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https://=
issues.apache.org/jira/browse/SPARK-11198" style=3D"color:rgb(71,135,242);t=
ext-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11198</a>=C2=
=A0<span style=3D"line-height:1.6">Kinesis record deaggregation</span>=C2=
=A0- Kinesis streams have been upgraded to use KCL 1.4.0 and supports trans=
parent deaggregation of KPL-aggregated records.</font></li><li style=3D"mar=
gin-left:15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=
=3D"https://issues.apache.org/jira/browse/SPARK-10891" style=3D"color:rgb(7=
1,135,242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10=
891</a>=C2=A0<span style=3D"line-height:1.6">Kinesis message handler functi=
on</span>=C2=A0- Allows arbitraray function to be applied to a Kinesis reco=
rd in the Kinesis receiver before to customize what data is to be stored in=
 memory.</font></li><li style=3D"margin-left:15px;line-height:1.6"><font fa=
ce=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/brows=
e/SPARK-6328" style=3D"color:rgb(71,135,242);text-decoration:none;line-heig=
ht:1.6" target=3D"_blank">SPARK-6328=C2=A0</a>=C2=A0<span style=3D"line-hei=
ght:1.6">Python Streamng Listener API</span>=C2=A0- Get streaming statistic=
s (scheduling delays, batch processing times, etc.) in streaming.</font></l=
i></ul></li></ul><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 1=
0px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:1=
5px;line-height:1.6"><span style=3D"line-height:1.6"><font face=3D"monospac=
e, monospace">UI Improvements</font></span><ul style=3D"padding:0px;margin:=
0px 0px 0px 25px;line-height:1.6"><li style=3D"margin-left:15px;line-height=
:1.6"><font face=3D"monospace, monospace">Made failures visible in the stre=
aming tab, in the timelines, batch list, and batch details page.</font></li=
><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, mo=
nospace">Made output operations visible in the streaming tab as progress ba=
rs.</font></li></ul></li></ul><h3 style=3D"font-size:12.8px;margin:16px 0px=
 0px;line-height:1.43;color:rgb(51,51,51)"><font style=3D"font-weight:norma=
l" face=3D"monospace, monospace" size=3D"2">MLlib</font></h3><h4 style=3D"f=
ont-size:12.8px;margin:16px 0px 0px;line-height:1.6;color:rgb(51,51,51)"><f=
ont style=3D"font-weight:normal" face=3D"monospace, monospace">New algorith=
ms/models</font></h4><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0=
px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-le=
ft:15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"htt=
ps://issues.apache.org/jira/browse/SPARK-8518" style=3D"color:rgb(71,135,24=
2);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-8518=C2=A0=
</a>=C2=A0<span style=3D"line-height:1.6">Survival analysis</span>=C2=A0- L=
og-linear model for survival analysis</font></li><li style=3D"margin-left:1=
5px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"https:/=
/issues.apache.org/jira/browse/SPARK-9834" style=3D"color:rgb(71,135,242);t=
ext-decoration:none;line-height:1.6" target=3D"_blank">SPARK-9834=C2=A0</a>=
=C2=A0<span style=3D"line-height:1.6">Normal equation for least squares</sp=
an>=C2=A0- Normal equation solver, providing R-like model summary statistic=
s</font></li><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"m=
onospace, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK=
-3147" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6"=
 target=3D"_blank">SPARK-3147=C2=A0</a>=C2=A0<span style=3D"line-height:1.6=
">Online hypothesis testing</span>=C2=A0- A/B testing in the Spark Streamin=
g framework</font></li><li style=3D"margin-left:15px;line-height:1.6"><font=
 face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/br=
owse/SPARK-9930" style=3D"color:rgb(71,135,242);text-decoration:none;line-h=
eight:1.6" target=3D"_blank">SPARK-9930=C2=A0</a>=C2=A0<span style=3D"line-=
height:1.6">New feature transformers</span>=C2=A0- ChiSqSelector, QuantileD=
iscretizer, SQL transformer</font></li><li style=3D"margin-left:15px;line-h=
eight:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.ap=
ache.org/jira/browse/SPARK-6517" style=3D"color:rgb(71,135,242);text-decora=
tion:none;line-height:1.6" target=3D"_blank">SPARK-6517=C2=A0</a>=C2=A0<spa=
n style=3D"line-height:1.6">Bisecting K-Means clustering</span>=C2=A0- Fast=
 top-down clustering variant of K-Means</font></li></ul><h4 style=3D"font-s=
ize:12.8px;margin:16px 0px 0px;line-height:1.6;color:rgb(51,51,51)"><font s=
tyle=3D"font-weight:normal" face=3D"monospace, monospace">API improvements<=
/font></h4><ul style=3D"font-size:12.8px;padding:0px;margin:0px 0px 10px 25=
px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D"margin-left:15px;li=
ne-height:1.6"><span style=3D"line-height:1.6"><font face=3D"monospace, mon=
ospace">ML Pipelines</font></span><ul style=3D"padding:0px;margin:0px 0px 0=
px 25px;line-height:1.6"><li style=3D"margin-left:15px;line-height:1.6"><fo=
nt face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/=
browse/SPARK-6725" style=3D"color:rgb(71,135,242);text-decoration:none;line=
-height:1.6" target=3D"_blank">SPARK-6725=C2=A0</a>=C2=A0<span style=3D"lin=
e-height:1.6">Pipeline persistence</span>=C2=A0- Save/load for ML Pipelines=
, with partial coverage of=C2=A0<a href=3D"http://spark.ml/" target=3D"_bla=
nk">spark.ml</a>algorithms</font></li><li style=3D"margin-left:15px;line-he=
ight:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apa=
che.org/jira/browse/SPARK-5565" style=3D"color:rgb(71,135,242);text-decorat=
ion:none;line-height:1.6" target=3D"_blank">SPARK-5565=C2=A0</a>=C2=A0<span=
 style=3D"line-height:1.6">LDA in ML Pipelines</span>=C2=A0- API for Latent=
 Dirichlet Allocation in ML Pipelines</font></li></ul></li><li style=3D"mar=
gin-left:15px;line-height:1.6"><span style=3D"line-height:1.6"><font face=
=3D"monospace, monospace">R API</font></span><ul style=3D"padding:0px;margi=
n:0px 0px 0px 25px;line-height:1.6"><li style=3D"margin-left:15px;line-heig=
ht:1.6"><font face=3D"monospace, monospace"><a href=3D"https://issues.apach=
e.org/jira/browse/SPARK-9836" style=3D"color:rgb(71,135,242);text-decoratio=
n:none;line-height:1.6" target=3D"_blank">SPARK-9836=C2=A0</a>=C2=A0<span s=
tyle=3D"line-height:1.6">R-like statistics for GLMs</span>=C2=A0- (Partial)=
 R-like stats for ordinary least squares via summary(model)</font></li><li =
style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospa=
ce"><a href=3D"https://issues.apache.org/jira/browse/SPARK-9681" style=3D"c=
olor:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_blank=
">SPARK-9681=C2=A0</a>=C2=A0<span style=3D"line-height:1.6">Feature interac=
tions in R formula</span>=C2=A0- Interaction operator &quot;:&quot; in R fo=
rmula</font></li></ul></li><li style=3D"margin-left:15px;line-height:1.6"><=
font face=3D"monospace, monospace"><span style=3D"line-height:1.6">Python A=
PI</span>=C2=A0- Many improvements to Python API to approach feature parity=
</font></li></ul><h4 style=3D"font-size:12.8px;margin:16px 0px 0px;line-hei=
ght:1.6;color:rgb(51,51,51)"><font style=3D"font-weight:normal" face=3D"mon=
ospace, monospace">Misc improvements</font></h4><ul style=3D"font-size:12.8=
px;padding:0px;margin:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,=
51)"><li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace=
, monospace"><a href=3D"https://issues.apache.org/jira/browse/SPARK-7685" s=
tyle=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=
=3D"_blank">SPARK-7685=C2=A0</a>,=C2=A0<a href=3D"https://issues.apache.org=
/jira/browse/SPARK-9642" style=3D"color:rgb(71,135,242);text-decoration:non=
e;line-height:1.6" target=3D"_blank">SPARK-9642=C2=A0</a>=C2=A0<span style=
=3D"line-height:1.6">Instance weights for GLMs</span>=C2=A0- Logistic and L=
inear Regression can take instance weights</font></li><li style=3D"margin-l=
eft:15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"ht=
tps://issues.apache.org/jira/browse/SPARK-10384" style=3D"color:rgb(71,135,=
242);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-10384</a=
>,=C2=A0<a href=3D"https://issues.apache.org/jira/browse/SPARK-10385" style=
=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=3D"_=
blank">SPARK-10385</a>=C2=A0<span style=3D"line-height:1.6">Univariate and =
bivariate statistics in DataFrames</span>=C2=A0- Variance, stddev, correlat=
ions, etc.</font></li><li style=3D"margin-left:15px;line-height:1.6"><font =
face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/bro=
wse/SPARK-10117" style=3D"color:rgb(71,135,242);text-decoration:none;line-h=
eight:1.6" target=3D"_blank">SPARK-10117</a>=C2=A0<span style=3D"line-heigh=
t:1.6">LIBSVM data source</span>=C2=A0- LIBSVM as a SQL data source</font><=
h4 style=3D"margin:16px 0px 0px;line-height:1.6;color:inherit"><font style=
=3D"font-weight:normal" face=3D"monospace, monospace">Documentation improve=
ments</font></h4></li><li style=3D"margin-left:15px;line-height:1.6"><font =
face=3D"monospace, monospace"><a href=3D"https://issues.apache.org/jira/bro=
wse/SPARK-7751" style=3D"color:rgb(71,135,242);text-decoration:none;line-he=
ight:1.6" target=3D"_blank">SPARK-7751=C2=A0</a>=C2=A0<span style=3D"line-h=
eight:1.6">@since versions</span>=C2=A0- Documentation includes initial ver=
sion when classes and methods were added</font></li><li style=3D"margin-lef=
t:15px;line-height:1.6"><font face=3D"monospace, monospace"><a href=3D"http=
s://issues.apache.org/jira/browse/SPARK-11337" style=3D"color:rgb(71,135,24=
2);text-decoration:none;line-height:1.6" target=3D"_blank">SPARK-11337</a>=
=C2=A0<span style=3D"line-height:1.6">Testable example code</span>=C2=A0- A=
utomated testing for code in user guide examples</font></li></ul><h2 style=
=3D"font-size:12.8px;margin:16px 0px 0px;line-height:1.225;color:rgb(51,51,=
51)"><font style=3D"font-weight:normal" face=3D"monospace, monospace" size=
=3D"2">Deprecations</font></h2><ul style=3D"font-size:12.8px;padding:0px;ma=
rgin:0px 0px 10px 25px;line-height:25.6px;color:rgb(51,51,51)"><li style=3D=
"margin-left:15px;line-height:1.6"><font face=3D"monospace, monospace">In s=
park.mllib.clustering.KMeans, the &quot;runs&quot; parameter has been depre=
cated.</font></li><li style=3D"margin-left:15px;line-height:1.6"><font face=
=3D"monospace, monospace">In spark.ml.classification.LogisticRegressionMode=
l and spark.ml.regression.LinearRegressionModel, the &quot;weights&quot; fi=
eld has been deprecated, in favor of the new name &quot;coefficients.&quot;=
 This helps disambiguate from instance (row) weights given to algorithms.</=
font></li></ul><h2 style=3D"font-size:12.8px;margin:16px 0px 0px;line-heigh=
t:1.225;color:rgb(51,51,51)"><font style=3D"font-weight:normal" face=3D"mon=
ospace, monospace" size=3D"2">Changes of behavior</font></h2><ul style=3D"f=
ont-size:12.8px;padding:0px;margin:0px 0px 10px 25px;line-height:25.6px;col=
or:rgb(51,51,51)"><li style=3D"margin-left:15px;line-height:1.6"><font face=
=3D"monospace, monospace">spark.mllib.tree.GradientBoostedTrees validationT=
ol has changed semantics in 1.6. Previously, it was a threshold for absolut=
e change in error. Now, it resembles the behavior of GradientDescent conver=
genceTol: For large errors, it uses relative error (relative to the previou=
s error); for small errors (&lt; 0.01), it uses absolute error.</font></li>=
<li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, mon=
ospace">spark.ml.feature.RegexTokenizer: Previously, it did not convert str=
ings to lowercase before tokenizing. Now, it converts to lowercase by defau=
lt, with an option not to. This matches the behavior of the simpler Tokeniz=
er transformer.</font></li><li style=3D"margin-left:15px;line-height:1.6"><=
font face=3D"monospace, monospace">Spark SQL&#39;s partition discovery has =
been changed to only discover partition directories that are children of th=
e given path. (i.e. if=C2=A0<code style=3D"padding:0px;border-radius:0px;wh=
ite-space:nowrap;border:none;margin:0px;line-height:1.6;background:none">pa=
th=3D&quot;/my/data/x=3D1&quot;</code>=C2=A0then=C2=A0<code style=3D"paddin=
g:0px;border-radius:0px;white-space:nowrap;border:none;margin:0px;line-heig=
ht:1.6;background:none">x=3D1</code>=C2=A0will no longer be considered a pa=
rtition but only children of=C2=A0<code style=3D"padding:0px;border-radius:=
0px;white-space:nowrap;border:none;margin:0px;line-height:1.6;background:no=
ne">x=3D1</code>.) This behavior can be overridden by manually specifying t=
he=C2=A0<code style=3D"padding:0px;border-radius:0px;white-space:nowrap;bor=
der:none;margin:0px;line-height:1.6;background:none">basePath</code>=C2=A0t=
hat partitioning discovery should start with (<a href=3D"https://issues.apa=
che.org/jira/browse/SPARK-11678" style=3D"color:rgb(71,135,242);text-decora=
tion:none;line-height:1.6" target=3D"_blank">SPARK-11678</a>).</font></li><=
li style=3D"margin-left:15px;line-height:1.6"><font face=3D"monospace, mono=
space">When casting a value of an integral type to timestamp (e.g. casting =
a long value to timestamp), the value is treated as being in seconds instea=
d of milliseconds (<a href=3D"https://issues.apache.org/jira/browse/SPARK-1=
1724" style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" =
target=3D"_blank">SPARK-11724</a>).</font></li><li style=3D"margin-left:15p=
x;line-height:1.6"><font face=3D"monospace, monospace">With the improved qu=
ery planner for queries having distinct aggregations (<a href=3D"https://is=
sues.apache.org/jira/browse/SPARK-9241" style=3D"color:rgb(71,135,242);text=
-decoration:none;line-height:1.6" target=3D"_blank">SPARK-9241</a>), the pl=
an of a query having a single distinct aggregation has been changed to a mo=
re robust version. To switch back to the plan generated by Spark 1.5&#39;s =
planner, please set=C2=A0<code style=3D"padding:0px;border-radius:0px;white=
-space:nowrap;border:none;margin:0px;line-height:1.6;background-image:none;=
background-color:initial;background-repeat:initial">spark.sql.specializeSin=
gleDistinctAggPlanning</code>=C2=A0to=C2=A0<code style=3D"padding:0px;borde=
r-radius:0px;white-space:nowrap;border:none;margin:0px;line-height:1.6;back=
ground-image:none;background-color:initial;background-repeat:initial">true<=
/code>=C2=A0(<a href=3D"https://issues.apache.org/jira/browse/SPARK-12077" =
style=3D"color:rgb(71,135,242);text-decoration:none;line-height:1.6" target=
=3D"_blank">SPARK-12077</a>).</font></li></ul></div></div>
</blockquote></div></div></div><span><font color=3D"#888888"><br><br clear=
=3D"all"><br>-- <br><div>Luciano Resende<br><a href=3D"http://people.apache=
.org/~lresende" target=3D"_blank">http://people.apache.org/~lresende</a><br=
><a href=3D"http://twitter.com/lresende1975" target=3D"_blank">http://twitt=
er.com/lresende1975</a><br><a href=3D"http://lresende.blogspot.com/" target=
=3D"_blank">http://lresende.blogspot.com/</a></div>
</font></span></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span><=
font color=3D"#888888">-- <br><div>Best Regards<br><br>Jeff Zhang</div>
</font></span></div>
</blockquote></div><br></div></div></div></div>

--001a1140ef920f15fe052772cf50--