arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject arrow git commit: ARROW-1551: [Website] Website updates, blog post for 0.7.0
Date Tue, 19 Sep 2017 13:10:11 GMT
Repository: arrow
Updated Branches:
  refs/heads/master e1d9c7fc7 -> 0d5e699c2


ARROW-1551: [Website] Website updates, blog post for 0.7.0

I drafted a post to publish tomorrow. If anyone would like to make some changes or additions
please post a link to a git commit here for me to cherry pick

cc @kou @trxcllnt

@pcmoritz I think we should write a whole blog post about the object serialization functions.
The perf wins over pickle when working with large datasets are a pretty big deal

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #1111 from wesm/ARROW-1551 and squashes the following commits:

3e05047e [Wes McKinney] Update publication date to 19 September
a9f8770e [Wes McKinney] More edits, links
8c877d9c [Wes McKinney] Draft 0.7.0 release post


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/0d5e699c
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/0d5e699c
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/0d5e699c

Branch: refs/heads/master
Commit: 0d5e699c2d8a52f38f1c061b785e5af39ee30f95
Parents: e1d9c7f
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Tue Sep 19 09:10:06 2017 -0400
Committer: Wes McKinney <wes.mckinney@twosigma.com>
Committed: Tue Sep 19 09:10:06 2017 -0400

----------------------------------------------------------------------
 site/_posts/2017-09-19-0.7.0-release.md | 190 +++++++++++++++++++++++++++
 site/_release/index.md                  |   2 +
 site/index.html                         |  27 ++--
 site/install.md                         |  32 ++---
 4 files changed, 227 insertions(+), 24 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/0d5e699c/site/_posts/2017-09-19-0.7.0-release.md
----------------------------------------------------------------------
diff --git a/site/_posts/2017-09-19-0.7.0-release.md b/site/_posts/2017-09-19-0.7.0-release.md
new file mode 100644
index 0000000..dd253df
--- /dev/null
+++ b/site/_posts/2017-09-19-0.7.0-release.md
@@ -0,0 +1,190 @@
+---
+layout: post
+title: "Apache Arrow 0.7.0 Release"
+date: "2017-09-19 00:00:00 -0400"
+author: wesm
+categories: [release]
+---
+<!--
+{% comment %}
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to you under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+{% endcomment %}
+-->
+
+The Apache Arrow team is pleased to announce the 0.7.0 release. It includes
+[**133 resolved JIRAs**][1] many new features and bug fixes to the various
+language implementations. The Arrow memory format remains stable since the
+0.3.x release.
+
+See the [Install Page][2] to learn how to get the libraries for your
+platform. The [complete changelog][3] is also available.
+
+We include some highlights from the release in this post.
+
+## New PMC Member: Kouhei Sutou
+
+Since the last release we have added [Kou][4] to the Arrow Project Management
+Committee. He is also a PMC for Apache Subversion, and a major contributor to
+many other open source projects.
+
+As an active member of the Ruby community in Japan, Kou has been developing the
+GLib-based C bindings for Arrow with associated Ruby wrappers, to enable Ruby
+users to benefit from the work that's happening in Apache Arrow.
+
+We are excited to be collaborating with the Ruby community on shared
+infrastructure for in-memory analytics and data science.
+
+## Expanded JavaScript (TypeScript) Implementation
+
+[Paul Taylor][5] from the [Falcor][7] and [ReactiveX][6] projects has worked to
+expand the JavaScript implementation (which is written in TypeScript), using
+the latest in modern JavaScript build and packaging technology. We are looking
+forward to building out the JS implementation and bringing it up to full
+functionality with the C++ and Java implementations.
+
+We are looking for more JavaScript developers to join the project and work
+together to make Arrow for JS work well with many kinds of front end use cases,
+like real time data visualization.
+
+## Type casting for C++ and Python
+
+As part of longer-term efforts to build an Arrow-native in-memory analytics
+library, we implemented a variety of type conversion functions. These functions
+are essential in ETL tasks when conforming one table schema to another. These
+are similar to the `astype` function in NumPy.
+
+```python
+In [17]: import pyarrow as pa
+
+In [18]: arr = pa.array([True, False, None, True])
+
+In [19]: arr
+Out[19]:
+<pyarrow.lib.BooleanArray object at 0x7ff6fb069b88>
+[
+  True,
+  False,
+  NA,
+  True
+]
+
+In [20]: arr.cast(pa.int32())
+Out[20]:
+<pyarrow.lib.Int32Array object at 0x7ff6fb0383b8>
+[
+  1,
+  0,
+  NA,
+  1
+]
+```
+
+Over time these will expand to support as many input-and-output type
+combinations with optimized conversions.
+
+## New Arrow GPU (CUDA) Extension Library for C++
+
+To help with GPU-related projects using Arrow, like the [GPU Open Analytics
+Initiative][8], we have started a C++ add-on library to simplify Arrow memory
+management on CUDA-enabled graphics cards. We would like to expand this to
+include a library of reusable CUDA kernel functions for GPU analytics on Arrow
+columnar memory.
+
+For example, we could write a record batch from CPU memory to GPU device memory
+like so (some error checking omitted):
+
+```c++
+#include <arrow/api.h>
+#include <arrow/gpu/cuda_api.h>
+
+using namespace arrow;
+
+gpu::CudaDeviceManager* manager;
+std::shared_ptr<gpu::CudaContext> context;
+
+gpu::CudaDeviceManager::GetInstance(&manager)
+manager_->GetContext(kGpuNumber, &context);
+
+std::shared_ptr<RecordBatch> batch = GetCpuData();
+
+std::shared_ptr<gpu::CudaBuffer> device_serialized;
+gpu::SerializeRecordBatch(*batch, context_.get(), &device_serialized));
+```
+
+We can then "read" the GPU record batch, but the returned `arrow::RecordBatch`
+internally will contain GPU device pointers that you can use for CUDA kernel
+calls:
+
+```
+std::shared_ptr<RecordBatch> device_batch;
+gpu::ReadRecordBatch(batch->schema(), device_serialized,
+                     default_memory_pool(), &device_batch));
+
+// Now run some CUDA kernels on device_batch
+```
+
+## Decimal Integration Tests
+
+[Phillip Cloud][9] has been working on decimal support in C++ to enable Parquet
+read/write support in C++ and Python, and also end-to-end testing against the
+Arrow Java libraries.
+
+In the upcoming releases, we hope to complete the remaining data types that
+need end-to-end testing between Java and C++:
+
+* Fixed size lists (variable-size lists already implemented)
+* Fixes size binary
+* Unions
+* Maps
+* Time intervals
+
+## Other Notable Python Changes
+
+Some highlights of Python development outside of bug fixes and general API
+improvements include:
+
+* Simplified `put` and `get` arbitrary Python objects in Plasma objects
+* [High-speed, memory efficient object serialization][10]. This is important
+  enough that we will likely write a dedicated blog post about it.
+* New `flavor='spark'` option to `pyarrow.parquet.write_table` to enable easy
+  writing of Parquet files maximized for Spark compatibility
+* `parquet.write_to_dataset` function with support for partitioned writes
+* Improved support for Dask filesystems
+* Improved Python usability for IPC: read and write schemas and record batches
+  more easily. See the [API docs][11] for more about these.
+
+## The Road Ahead
+
+Upcoming Arrow releases will continue to expand the project to cover more use
+cases. In addition to completing end-to-end testing for all the major data
+types, some of us will be shifting attention to building Arrow-native in-memory
+analytics libraries.
+
+We are looking for more JavaScript, R, and other programming language
+developers to join the project and expand the available implementations and
+bindings to more languages.
+
+[1]: https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.7.0
+[2]: http://arrow.apache.org/install
+[3]: http://arrow.apache.org/release/0.7.0.html
+[4]: https://github.com/kou
+[5]: https://github.com/trxcllnt
+[6]: http://reactivex.io
+[7]: https://github.com/netflix/falcor
+[8]: http://gpuopenanalytics.com/
+[9]: http://github.com/cpcloud
+[10]: http://arrow.apache.org/docs/python/ipc.html
+[11]: http://arrow.apache.org/docs/python/api.html
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow/blob/0d5e699c/site/_release/index.md
----------------------------------------------------------------------
diff --git a/site/_release/index.md b/site/_release/index.md
index b373d8b..ad78de8 100644
--- a/site/_release/index.md
+++ b/site/_release/index.md
@@ -26,6 +26,7 @@ limitations under the License.
 
 Navigate to the release page for downloads and the changelog.
 
+* [0.7.0 (17 September 2017)][8]
 * [0.6.0 (14 August 2017)][7]
 * [0.5.0 (23 July 2017)][6]
 * [0.4.1 (9 June 2017)][5]
@@ -41,3 +42,4 @@ Navigate to the release page for downloads and the changelog.
 [5]: {{ site.baseurl }}/release/0.4.1.html
 [6]: {{ site.baseurl }}/release/0.5.0.html
 [7]: {{ site.baseurl }}/release/0.6.0.html
+[8]: {{ site.baseurl }}/release/0.7.0.html

http://git-wip-us.apache.org/repos/asf/arrow/blob/0d5e699c/site/index.html
----------------------------------------------------------------------
diff --git a/site/index.html b/site/index.html
index 95d9f4d..01836d7 100644
--- a/site/index.html
+++ b/site/index.html
@@ -7,26 +7,37 @@ layout: default
         <p class="lead">Powering Columnar In-Memory Analytics</p>
         <p>
           <a class="btn btn-lg btn-success" href="mailto:dev-subscribe@arrow.apache.org"
role="button">Join Mailing List</a>
-          <a class="btn btn-lg btn-primary" href="{{ site.baseurl }}/install/" role="button">Install
(0.6.0 Release - August 14, 2017)</a>
+          <a class="btn btn-lg btn-primary" href="{{ site.baseurl }}/install/" role="button">Install
(0.7.0 Release - September 17, 2017)</a>
         </p>
       </div>
-      <h4><strong>Latest News</strong>: <a href="{{ site.baseurl }}/blog/">Apache
Arrow 0.6.0 release</a></h4>
+      <h4><strong>Latest News</strong>: <a href="{{ site.baseurl }}/blog/">Apache
Arrow 0.7.0 release</a></h4>
       <div class="row">
         <div class="col-lg-4">
           <h2>Fast</h2>
-          <p>Apache Arrow&#8482; enables execution engines to take advantage of
the latest SIM
-D (Single input multiple data) operations included in modern processors, for native vectorized
optimization of analytical data processing. Columnar layout of data also allows for a better
use of CPU caches by placing all data relevant to a column operation in as compact of a format
- as possible.</p>
+          <p>Apache Arrow&#8482; enables execution engines to take advantage of
+ the latest SIMD (Single input multiple data) operations included in modern
+ processors, for native vectorized optimization of analytical data
+ processing. Columnar layout is optimized for data locality for better
+ performance on modern hardware like CPUs and GPUs.</p>
+
           <p>The Arrow memory format supports <strong>zero-copy reads</strong>
           for lightning-fast data access without serialization overhead.</p>
+
         </div>
         <div class="col-lg-4">
           <h2>Flexible</h2>
-          <p>Arrow acts as a new high-performance interface between various systems.
It is also focused on supporting a wide variety of industry-standard programming languages.
Java, C, C++, Python, Ruby, and JavaScript implementations are in progress and more languages
are welcome.</p>
+          <p>Arrow acts as a new high-performance interface between various
+          systems. It is also focused on supporting a wide variety of
+          industry-standard programming languages. Java, C, C++, Python, Ruby,
+          and JavaScript implementations are in progress and more languages are
+          welcome.</p>
         </div>
         <div class="col-lg-4">
           <h2>Standard</h2>
-          <p>Apache Arrow is backed by key developers of 13 major open source projects,
including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis, Impala, Kudu, Pandas, Parquet, Phoenix,
Spark, and Storm making it the de-facto standard for columnar in-memory analytics.</p>
+          <p>Apache Arrow is backed by key developers of 13 major open source
+          projects, including Calcite, Cassandra, Drill, Hadoop, HBase, Ibis,
+          Impala, Kudu, Pandas, Parquet, Phoenix, Spark, and Storm making it
+          the de-facto standard for columnar in-memory analytics.</p>
         </div>
      </div> <!-- close "row" div -->
 
@@ -41,7 +52,7 @@ D (Single input multiple data) operations included in modern processors,
for nat
 <img src="img/copy2.png" alt="common data layer" style="width:100%" />
 <ul>
     <li>Each system has its own internal memory format</li>
-    <li>70-80% CPU wasted on serialization and deserialization</li>
+    <li>70-80% computation wasted on serialization and deserialization</li>
     <li>Similar functionality implemented in multiple projects</li>
 </ul>
 </div>

http://git-wip-us.apache.org/repos/asf/arrow/blob/0d5e699c/site/install.md
----------------------------------------------------------------------
diff --git a/site/install.md b/site/install.md
index 6cb80c1..74d2986 100644
--- a/site/install.md
+++ b/site/install.md
@@ -20,17 +20,17 @@ limitations under the License.
 {% endcomment %}
 -->
 
-## Current Version: 0.6.0
+## Current Version: 0.7.0
 
-### Released: 14 August 2017
+### Released: 17 September 2017
 
 See the [release notes][10] for more about what's new.
 
 ### Source release
 
-* **Source Release**: [apache-arrow-0.6.0.tar.gz][6]
-* **Verification**: [md5][3], [asc][7]
-* [Git tag b173334][2]
+* **Source Release**: [apache-arrow-0.7.0.tar.gz][6]
+* **Verification**: [sha512][3], [asc][7]
+* [Git tag 97f9029][2]
 
 ### Java Packages
 
@@ -52,8 +52,8 @@ Install them with:
 
 
 ```shell
-conda install arrow-cpp=0.6.* -c conda-forge
-conda install pyarrow==0.6.* -c conda-forge
+conda install arrow-cpp=0.7.* -c conda-forge
+conda install pyarrow==0.7.* -c conda-forge
 ```
 
 ### Python Wheels on PyPI (Unofficial)
@@ -61,10 +61,10 @@ conda install pyarrow==0.6.* -c conda-forge
 We have provided binary wheels on PyPI for Linux, macOS, and Windows:
 
 ```shell
-pip install pyarrow==0.6.*
+pip install pyarrow==0.7.*
 ```
 
-We recommend pinning `0.6.*` in `requirements.txt` to install the latest patch
+We recommend pinning `0.7.*` in `requirements.txt` to install the latest patch
 release.
 
 These include the Apache Arrow and Apache Parquet C++ binary libraries bundled
@@ -149,13 +149,13 @@ conda install arrow-cpp -c twosigma
 conda install pyarrow -c twosigma
 ```
 
-[1]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/
-[2]: https://github.com/apache/arrow/releases/tag/apache-arrow-0.6.0
-[3]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.md5
-[4]: http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.6.0%22
+[1]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/
+[2]: https://github.com/apache/arrow/releases/tag/apache-arrow-0.7.0
+[3]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.sha512
+[4]: http://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.arrow%22%20AND%20v%3A%220.7.0%22
 [5]: http://conda-forge.github.io
-[6]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz
-[7]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.6.0/apache-arrow-0.6.0.tar.gz.asc
+[6]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz
+[7]: https://www.apache.org/dyn/closer.cgi/arrow/arrow-0.7.0/apache-arrow-0.7.0.tar.gz.asc
 [8]: https://github.com/red-data-tools/parquet-glib
 [9]: https://github.com/red-data-tools/arrow-packages
-[10]: http://arrow.apache.org/release/0.6.0.html
+[10]: http://arrow.apache.org/release/0.7.0.html


Mime
View raw message