+ Spark Release 2.0.1 | Apache Spark + +

Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 953F5200B8B for ; Tue, 4 Oct 2016 19:02:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 93AEE160AE8; Tue, 4 Oct 2016 17:02:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id AB9FB160AC7 for ; Tue, 4 Oct 2016 19:02:46 +0200 (CEST) Received: (qmail 58461 invoked by uid 500); 4 Oct 2016 17:02:45 -0000 Mailing-List: contact commits-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list commits@spark.apache.org Received: (qmail 58451 invoked by uid 99); 4 Oct 2016 17:02:45 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 04 Oct 2016 17:02:45 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 970B2E053F; Tue, 4 Oct 2016 17:02:45 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: rxin@apache.org To: commits@spark.apache.org Date: Tue, 04 Oct 2016 17:02:45 -0000 Message-Id: <0938a8c4bf494c8fa7f685f57a9fdfc8@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [1/3] spark-website git commit: Add Spark 2.0.1 release. archived-at: Tue, 04 Oct 2016 17:02:48 -0000 Repository: spark-website Updated Branches: refs/heads/asf-site 7c96b646e -> a8dce9912 http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-1-1.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-1-1.html b/site/releases/spark-release-1-1-1.html index b0ddad8..ff4ea4f 100644 --- a/site/releases/spark-release-1-1-1.html +++ b/site/releases/spark-release-1-1-1.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-2-0.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-2-0.html b/site/releases/spark-release-1-2-0.html index 3eea59b..3c74756 100644 --- a/site/releases/spark-release-1-2-0.html +++ b/site/releases/spark-release-1-2-0.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

@@ -194,7 +194,7 @@

In 1.2 Spark core upgrades two major subsystems to improve the performance and stability of very large scale shuffles. The first is Spark’s communication manager used during bulk transfers, which upgrades to a netty-based implementation. The second is Spark’s shuffle mechanism, which upgrades to the “sort based” shuffle initially released in Spark 1.1. These both improve the performance and stability of very large scale shuffles. Spark also adds an elastic scaling mechanism designed to improve cluster utilization during long running ETL-style jobs. This is currently supported on YARN and will make its way to other cluster managers in future versions. Finally, Spark 1.2 adds support for Scala 2.11. For instructions on building for Scala 2.11 see the build documentation.

Spark Streaming

This release includes two major feature additions to Spark’s streaming library, a Python API and a write ahead log for full driver H/A. The Python API covers almost all the DStream transformations and output operations. Input sources based on text files and text over sockets are currently supported. Support for Kafka and Flume input streams in Python will be added in the next release. Second, Spark streaming now features H/A driver support through a write ahead log (WAL). In Spark 1.1 and earlier, some buffered (received but not yet processed) data can be lost during driver restarts. To prevent this Spark 1.2 adds an optional WAL, which buffers received data into a fault-tolerant file system (e.g. HDFS). See the streaming programming guide for more details.

MLLib

Spark 1.2 previews a new set of machine learning API’s in a package called spark.ml that supports learning pipelines, where multiple algorithms are run in sequence with varying parameters. This type of pipeline is common in practical machine learning deployments. The new ML package uses Spark’s SchemaRDD to represent ML datasets, providing direct interoperability with Spark SQL. In addition to the new API, Spark 1.2 extends decision trees with two tree ensemble methods: random forests and gradient-boosted trees, among the most successful tree-based models for classification and regression. Finally, MLlib’s Python implementation receives a major update in 1.2 to simplify the process of adding Python APIs, along with better Python API coverage.

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-2-1.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-2-1.html b/site/releases/spark-release-1-2-1.html index a4a1a67..d220fa2 100644 --- a/site/releases/spark-release-1-2-1.html +++ b/site/releases/spark-release-1-2-1.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-2-2.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-2-2.html b/site/releases/spark-release-1-2-2.html index 58f7b87..7b9f3d7 100644 --- a/site/releases/spark-release-1-2-2.html +++ b/site/releases/spark-release-1-2-2.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-3-0.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-3-0.html b/site/releases/spark-release-1-3-0.html index 1e673ff..978d0fe 100644 --- a/site/releases/spark-release-1-3-0.html +++ b/site/releases/spark-release-1-3-0.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

@@ -191,7 +191,7 @@

To download Spark 1.3 visit the downloads page.

Spark Core

Spark 1.3 sees a handful of usability improvements in the core engine. The core API now supports multi level aggregation trees to help speed up expensive reduce operations. Improved error reporting has been added for certain gotcha operations. Spark’s Jetty dependency is now shaded to help avoid conflicts with user programs. Spark now supports SSL encryption for some communication endpoints. Finaly, realtime GC metrics and record counts have been added to the UI.

DataFrame API

Spark 1.3 adds a new DataFrames API that provides powerful and convenient operators when working with structured datasets. The DataFrame is an evolution of the base RDD API that includes named fields along with schema information. It’s easy to construct a DataFrame from sources such as Hive tables, JSON data, a JDBC database, or any implementation of Spark’s new data source API. Data frames will become a common interchange format between Spark components and when importing and exporting data to other systems. Data frames are supported in Python, Scala, and Java.

@@ -203,7 +203,7 @@

In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for topic modeling, multinomial logistic regression for multiclass classification, Gaussian mixture model (GMM) and power iteration clustering for clustering, FP-growth for frequent pattern mining, and block matrix abstraction for distributed linear algebra. Initial support has been added for model import/export in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-mea ns and ALS receive updates that lead to significant performance gain. PySpark now supports the ML pipeline API added in Spark 1.2, and gradient boosted trees and Gaussian mixture model. Finally, the ML pipeline API has been ported to support the new DataFrames abstraction.

Spark Streaming

Spark 1.3 introduces a new direct Kafka API (docs) which enables exactly-once delivery without the use of write ahead logs. It also adds a Python Kafka API along with infrastructure for additional Python API’s in future releases. An online version of logistic regression and the ability to read binary records have also been added. For stateful operations, support has been added for loading of an initial state RDD. Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarific ations to the fault-tolerance semantics.

GraphX

GraphX adds a handful of utility functions in this release, including conversion into a canonical edge graph.

@@ -219,7 +219,7 @@

SPARK-6194: A memory leak in PySPark’s collect().
SPARK-6222: An issue with failure recovery in Spark Streaming.
SPARK-6315: Spark SQL can’t read parquet data generated with Spark 1.1.
SPARK-6315: Spark SQL can’t read parquet data generated with Spark 1.1.
SPARK-6247: Errors analyzing certain join types in Spark SQL.

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-3-1.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-3-1.html b/site/releases/spark-release-1-3-1.html index 027490b..d24675a 100644 --- a/site/releases/spark-release-1-3-1.html +++ b/site/releases/spark-release-1-3-1.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

@@ -196,10 +196,10 @@

Spark SQL

Unable to use reserved words in DDL (SPARK-6250)
Parquet no longer caches metadata (SPARK-6575)
Parquet no longer caches metadata (SPARK-6575)
Bug when joining two Parquet tables (SPARK-6851)
Unable to read parquet data generated by Spark 1.1.1 (SPARK-6315)
Parquet data source may use wrong Hadoop FileSystem (SPARK-6330)
Unable to read parquet data generated by Spark 1.1.1 (SPARK-6315)
Parquet data source may use wrong Hadoop FileSystem (SPARK-6330)

Spark Streaming

http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-4-0.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-4-0.html b/site/releases/spark-release-1-4-0.html index 8d60c0f..db4c88c 100644 --- a/site/releases/spark-release-1-4-0.html +++ b/site/releases/spark-release-1-4-0.html @@ -150,6 +150,9 @@

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

@@ -250,7 +250,7 @@ Python coverage. MLlib also adds several new algorithms.

Spark Streaming

Spark streaming adds visual instrumentation graphs and significantly improved debugging information in the UI. It also enhances support for both Kafka and Kinesis.

SPARK-7602: Visualization and monitoring in the streaming UI including batch drill down (SPARK-6796, SPARK-6862)

Test Partners

Thanks to The following organizations, who helped benchmark or integration test release candidates:
Intel, Palantir, Cloudera, Mesosphere, Huawei, Shopify, Netflix, Yahoo, UC Berkeley and Databricks.

Contributors

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

Latest News

Spark 2.0.1 released + (Oct 03, 2016)
Spark 2.0.0 released (Jul 26, 2016)
Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
Preview release of Spark 2.0 - (May 26, 2016)

Archive

You can consult JIRA for the detailed changes. We have curated a list of high level changes here:

APIs: RDD, DataFrame and SQL
Backend Execution: DataFrame and SQL
Integrations: Data Sources, Hive, Hadoop, Mesos and Cluster Management
R Language
Machine Learning and Advanced Analytics
Spark Streaming
Deprecations, Removals, Configs, and Behavior Changes
- Spark Core
- Spark SQL & DataFrames
- Spark Streaming
- MLlib
- APIs: RDD, DataFrame and SQL
- Backend Execution: DataFrame and SQL
- Integrations: Data Sources, Hive, Hadoop, Mesos and Cluster Management
- R Language
- Machine Learning and Advanced Analytics
- Spark Streaming
- Deprecations, Removals, Configs, and Behavior Changes
  - Spark Core
  - Spark SQL & DataFrames
  - Spark Streaming
  - MLlib
- Known Issues
  - SQL/DataFrame
  - Streaming
  - Known Issues
    - SQL/DataFrame
    - Streaming
  - Credits
  - Credits
  APIs: RDD, DataFrame and SQL
  http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-5-1.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-5-1.html b/site/releases/spark-release-1-5-1.html index 5cef05b..ba95eb5 100644 --- a/site/releases/spark-release-1-5-1.html +++ b/site/releases/spark-release-1-5-1.html @@ -150,6 +150,9 @@
  Latest News
  - Spark 2.0.1 released + (Oct 03, 2016)
  - Spark 2.0.0 released (Jul 26, 2016)
  - Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
  - Preview release of Spark 2.0 - (May 26, 2016)
  Archive
  http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-5-2.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-5-2.html b/site/releases/spark-release-1-5-2.html index c460f70..d101422 100644 --- a/site/releases/spark-release-1-5-2.html +++ b/site/releases/spark-release-1-5-2.html @@ -150,6 +150,9 @@
  Latest News
  - Spark 2.0.1 released + (Oct 03, 2016)
  - Spark 2.0.0 released (Jul 26, 2016)
  - Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
  - Preview release of Spark 2.0 - (May 26, 2016)
  Archive
  http://git-wip-us.apache.org/repos/asf/spark-website/blob/a8dce991/site/releases/spark-release-1-6-0.html ---------------------------------------------------------------------- diff --git a/site/releases/spark-release-1-6-0.html b/site/releases/spark-release-1-6-0.html index 4be55fe..793f847 100644 --- a/site/releases/spark-release-1-6-0.html +++ b/site/releases/spark-release-1-6-0.html @@ -150,6 +150,9 @@
  Latest News
  - Spark 2.0.1 released + (Oct 03, 2016)
  - Spark 2.0.0 released (Jul 26, 2016)
  - Call for Presentations for Spark Summit EU is Open (Jun 16, 2016)
  - Preview release of Spark 2.0 - (May 26, 2016)
  Archive
  @@ -191,13 +191,13 @@
  You can consult JIRA for the detailed changes. We have curated a list of high level changes here:
  - Spark Core/SQL
  - Spark Streaming
  - MLlib
  - Deprecations
  - Changes of behavior
  - Known issues
  - Credits
  - Spark Core/SQL
  - Spark Streaming
  - MLlib
  - Deprecations
  - Changes of behavior
  - Known issues
  - Credits
  Spark Core/SQL
  @@ -220,7 +220,7 @@
  - SPARK-10000 Unified Memory Management - Shared memory for execution and caching instead of exclusive division of the regions.
  - SPARK-11787 Parquet Performance - Improve Parquet scan performance when using flat schemas.
  - SPARK-9241 Improved query planner for queries having distinct aggregations - Query plans of distinct aggregations are more robust when distinct columns have high cardinality.
  - SPARK-9241 Improved query planner for queries having distinct aggregations - Query plans of distinct aggregations are more robust when distinct columns have high cardinality.
  - SPARK-9858 Adaptive query execution - Initial support for automatically selecting the number of reducers for joins and aggregations.
  - SPARK-10978 Avoiding double filters in Data Source API - When implementing a data source with filter pushdown, developers can now tell Spark SQL to avoid double evaluating a pushed-down filter.
  - SPARK-11111 Fast null-safe joins - Joins using null-safe equality (<=>) will now execute using SortMergeJoin instead of computing a cartisian product.

Latest News

Latest News

Spark Streaming

MLLib

Latest News

Latest News

Latest News

Spark Core

DataFrame API

Spark Streaming

GraphX

Latest News

Spark SQL

Spark Streaming

Latest News

Spark Streaming

Test Partners

Contributors

Latest News

Latest News

APIs: RDD, DataFrame and SQL

Latest News

Latest News

Latest News

Spark Core/SQL

Spark Streaming

Latest News

Latest News

Latest News

API Stability

Latest News