arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject [1/3] arrow-site git commit: Add Plasma blog post
Date Tue, 08 Aug 2017 14:25:54 GMT
Repository: arrow-site
Updated Branches:
  refs/heads/asf-site b286da84c -> 3b67853c5


http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/ipc.html
----------------------------------------------------------------------
diff --git a/docs/ipc.html b/docs/ipc.html
index ffbe491..69bfa36 100644
--- a/docs/ipc.html
+++ b/docs/ipc.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="interprocess-messaging--communication-ipc">Interprocess messaging / communication
(IPC)</h1>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/memory_layout.html
----------------------------------------------------------------------
diff --git a/docs/memory_layout.html b/docs/memory_layout.html
index 7703a15..98cb556 100644
--- a/docs/memory_layout.html
+++ b/docs/memory_layout.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="arrow-physical-memory-layout">Arrow: Physical memory layout</h1>
@@ -167,7 +172,11 @@ proprietary systems that utilize the open source components.</li>
 linearly in the nesting level</li>
   <li>Capable of representing fully-materialized and decoded / decompressed <a href="https://parquet.apache.org/documentation/latest/">Parquet</a>
 data</li>
-  <li>All contiguous memory buffers are aligned at 64-byte boundaries and padded to
a multiple of 64 bytes.</li>
+  <li>It is required to have all the contiguous memory buffers in an IPC payload
+aligned at 8-byte boundaries. In other words, each buffer must start at
+an aligned 8-byte offset.</li>
+  <li>The general recommendation is to align the buffers at 64-byte boundary, but
+this is not absolutely necessary.</li>
   <li>Any relative type can have null slots</li>
   <li>Arrays are immutable once created. Implementations can provide APIs to mutate
 an array, but applying mutations will require a new array data structure to
@@ -218,9 +227,9 @@ via byte swapping.</p>
 
 <h2 id="alignment-and-padding">Alignment and Padding</h2>
 
-<p>As noted above, all buffers are intended to be aligned in memory at 64 byte
-boundaries and padded to a length that is a multiple of 64 bytes.  The alignment
-requirement follows best practices for optimized memory access:</p>
+<p>As noted above, all buffers must be aligned in memory at 8-byte boundaries and padded
+to a length that is a multiple of 8 bytes.  The alignment requirement follows best
+practices for optimized memory access:</p>
 
 <ul>
   <li>Elements in numeric arrays will be guaranteed to be retrieved via aligned access.</li>
@@ -229,12 +238,14 @@ requirement follows best practices for optimized memory access:</p>
 data-structures over 64 bytes (which will be a common case for Arrow Arrays).</li>
 </ul>
 
-<p>Requiring padding to a multiple of 64 bytes allows for using <a href="https://software.intel.com/en-us/node/600110">SIMD</a>
instructions
+<p>Recommending padding to a multiple of 64 bytes allows for using <a href="https://software.intel.com/en-us/node/600110">SIMD</a>
instructions
 consistently in loops without additional conditional checks.
-This should allow for simpler and more efficient code.
+This should allow for simpler, efficient and CPU cache-friendly code.
 The specific padding length was chosen because it matches the largest known
-SIMD instruction registers available as of April 2016 (Intel AVX-512).
-Guaranteed padding can also allow certain compilers
+SIMD instruction registers available as of April 2016 (Intel AVX-512). In other
+words, we can load the entire 64-byte buffer into a 512-bit wide SIMD register
+and get data-level parallelism on all the columnar values packed into the 64-byte
+buffer. Guaranteed padding can also allow certain compilers
 to generate more optimized code directly (e.g. One can safely use Intel’s
 <code class="highlighter-rouge">-qopt-assume-safe-padding</code>).</p>
 

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/docs/metadata.html
----------------------------------------------------------------------
diff --git a/docs/metadata.html b/docs/metadata.html
index 76da9eb..7382193 100644
--- a/docs/metadata.html
+++ b/docs/metadata.html
@@ -106,17 +106,22 @@
 -->
 
 <!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
 -->
 
 <h1 id="metadata-logical-types-schemas-data-headers">Metadata: Logical types, schemas,
data headers</h1>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index f01301e..453eee8 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,125 @@
-<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"
><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link
href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate"
type="text/html" /><updated>2017-07-27T11:28:36-04:00</updated><id>/</id><entry><title
type="html">Speeding up PySpark with Apache Arrow</title><link href="/blog/2017/07/26/spark-arrow/"
rel="alternate" type="text/html" title="Speeding up PySpark with Apache Arrow" /><published>2017-07-26T12:00:00-04:00</published><updated>2017-07-26T12:00:00-04:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
+<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom"
><generator uri="https://jekyllrb.com/" version="3.4.3">Jekyll</generator><link
href="/feed.xml" rel="self" type="application/atom+xml" /><link href="/" rel="alternate"
type="text/html" /><updated>2017-08-08T10:25:08-04:00</updated><id>/</id><entry><title
type="html">Plasma In-Memory Object Store</title><link href="/blog/2017/08/08/plasma-in-memory-object-store/"
rel="alternate" type="text/html" title="Plasma In-Memory Object Store" /><published>2017-08-08T00:00:00-04:00</published><updated>2017-08-08T00:00:00-04:00</updated><id>/blog/2017/08/08/plasma-in-memory-object-store</id><content
type="html" xml:base="/blog/2017/08/08/plasma-in-memory-object-store/">&lt;!--
+
+--&gt;
+
+&lt;p&gt;&lt;em&gt;&lt;a href=&quot;https://people.eecs.berkeley.edu/~pcmoritz/&quot;&gt;Philipp
Moritz&lt;/a&gt; and &lt;a href=&quot;http://www.robertnishihara.com&quot;&gt;Robert
Nishihara&lt;/a&gt; are graduate students at UC
+ Berkeley.&lt;/em&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;plasma-a-high-performance-shared-memory-object-store&quot;&gt;Plasma:
A High-Performance Shared-Memory Object Store&lt;/h2&gt;
+
+&lt;h3 id=&quot;motivating-plasma&quot;&gt;Motivating Plasma&lt;/h3&gt;
+
+&lt;p&gt;This blog post presents Plasma, an in-memory object store that is being
+developed as part of Apache Arrow. &lt;strong&gt;Plasma holds immutable objects in
shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.&lt;/strong&gt; In light of the trend toward larger and larger multicore
machines,
+Plasma enables critical performance optimizations in the big data regime.&lt;/p&gt;
+
+&lt;p&gt;Plasma was initially developed as part of &lt;a href=&quot;https://github.com/ray-project/ray&quot;&gt;Ray&lt;/a&gt;,
and has recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.&lt;/p&gt;
+
+&lt;p&gt;One of the goals of Apache Arrow is to serve as a common data layer enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Expensive serialization and deserialization as well
as data copying are a
+common performance bottleneck in distributed computing.&lt;/strong&gt; For example,
a
+Python-based execution framework that wishes to distribute computation across
+multiple Python “worker” processes and then aggregate the results in a single
+“driver” process may choose to serialize data using the built-in &lt;code class=&quot;highlighter-rouge&quot;&gt;pickle&lt;/code&gt;
+library. Assuming one Python process per core, each worker process would have to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.&lt;/p&gt;
+
+&lt;p&gt;Using Plasma plus Arrow, the data being operated on would be placed in the
+Plasma store once, and all of the workers would read the data without copying or
+deserializing it (the workers would map the relevant region of memory into their
+own address spaces). The workers would then put the results of their computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.&lt;/p&gt;
+
+&lt;h3 id=&quot;the-plasma-api&quot;&gt;The Plasma API:&lt;/h3&gt;
+
+&lt;p&gt;Below we illustrate a subset of the API. The C++ API is documented more
fully
+&lt;a href=&quot;https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md&quot;&gt;here&lt;/a&gt;,
and the Python API is documented &lt;a href=&quot;https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Object IDs:&lt;/strong&gt; Each object is
associated with a string of bytes.&lt;/p&gt;
+
+&lt;p&gt;&lt;strong&gt;Creating an object:&lt;/strong&gt; Objects
are stored in Plasma in two stages. First, the
+object store &lt;em&gt;creates&lt;/em&gt; the object by allocating a buffer
for it. At this point,
+the client can write to the buffer and construct the object within the allocated
+buffer. When the client is done, the client &lt;em&gt;seals&lt;/em&gt; the
buffer making the object
+immutable and making it available to other Plasma clients.&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;pre
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;#
Create an object.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_id&lt;/span&gt; &lt;span
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pyarrow&lt;/span&gt;&lt;span
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plasma&lt;/span&gt;&lt;span
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ObjectID&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span
class=&quot;s&quot;&gt;'a'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_size&lt;/span&gt; &lt;span
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;
+&lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt; &lt;span
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;memoryview&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;create&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;object_size&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;))&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Write to the buffer.&lt;/span&gt;
+&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;
&lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;range&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1000&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;):&lt;/span&gt;
+    &lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;mi&quot;&gt;0&lt;/span&gt;
+
+&lt;span class=&quot;c&quot;&gt;# Seal the object making it immutable and
available to other clients.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span
class=&quot;n&quot;&gt;seal&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span
class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;&lt;strong&gt;Getting an object:&lt;/strong&gt; After an
object has been sealed, any client who knows the
+object ID can get the object.&lt;/p&gt;
+
+&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;pre
class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;#
Get the object from the store. This blocks until the object has been sealed.&lt;/span&gt;
+&lt;span class=&quot;n&quot;&gt;object_id&lt;/span&gt; &lt;span
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pyarrow&lt;/span&gt;&lt;span
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;plasma&lt;/span&gt;&lt;span
class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;ObjectID&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;20&lt;/span&gt;
&lt;span class=&quot;o&quot;&gt;*&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;b&lt;/span&gt;&lt;span
class=&quot;s&quot;&gt;'a'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buff&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;client&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;.&lt;/span&gt;&lt;span
class=&quot;n&quot;&gt;get&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;([&lt;/span&gt;&lt;span
class=&quot;n&quot;&gt;object_id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;])&lt;/span&gt;
+&lt;span class=&quot;nb&quot;&gt;buffer&lt;/span&gt; &lt;span
class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;memoryview&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;buff&lt;/span&gt;&lt;span
class=&quot;p&quot;&gt;)&lt;/span&gt;
+&lt;/code&gt;&lt;/pre&gt;
+&lt;/div&gt;
+
+&lt;p&gt;If the object has not been sealed yet, then the call to &lt;code class=&quot;highlighter-rouge&quot;&gt;client.get&lt;/code&gt;
will block
+until the object has been sealed.&lt;/p&gt;
+
+&lt;h3 id=&quot;a-sorting-application&quot;&gt;A sorting application&lt;/h3&gt;
+
+&lt;p&gt;To illustrate the benefits of Plasma, we demonstrate an &lt;strong&gt;11x
speedup&lt;/strong&gt; (on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.&lt;/p&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;We assume that the data is partitioned across K pandas DataFrames and
that
+each one already lives in the Plasma store.&lt;/li&gt;
+  &lt;li&gt;We subsample the data, sort the subsampled data, and use the result to
define
+L non-overlapping buckets.&lt;/li&gt;
+  &lt;li&gt;For each of the K data partitions and each of the L buckets, we find
the
+subset of the data partition that falls in the bucket, and we sort that
+subset.&lt;/li&gt;
+  &lt;li&gt;For each of the L buckets, we gather all of the K sorted subsets that
fall in
+that bucket.&lt;/li&gt;
+  &lt;li&gt;For each of the L buckets, we merge the corresponding K sorted subsets.&lt;/li&gt;
+  &lt;li&gt;We turn each bucket into a pandas DataFrame and place it in the Plasma
store.&lt;/li&gt;
+&lt;/ul&gt;
+
+&lt;p&gt;Using this scheme, we can sort the DataFrame (the data starts and ends in
the
+Plasma store), in 44 seconds, giving an 11x speedup over the baseline.&lt;/p&gt;
+
+&lt;h3 id=&quot;design&quot;&gt;Design&lt;/h3&gt;
+
+&lt;p&gt;The Plasma store runs as a separate process. It is written in C++ and is
+designed as a single-threaded event loop based on the &lt;a href=&quot;https://redis.io/&quot;&gt;Redis&lt;/a&gt;
event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using &lt;a href=&quot;https://google.github.io/flatbuffers/&quot;&gt;Google
Flatbuffers&lt;/a&gt;.&lt;/p&gt;
+
+&lt;h3 id=&quot;call-for-contributions&quot;&gt;Call for contributions&lt;/h3&gt;
+
+&lt;p&gt;Plasma is a work in progress, and the API is currently unstable. Today Plasma
is
+primarily used in &lt;a href=&quot;https://github.com/ray-project/ray&quot;&gt;Ray&lt;/a&gt;
as an in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasma’s API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the project.&lt;/p&gt;</content><author><name>Philipp
Moritz and Robert Nishihara</name></author></entry><entry><title
type="html">Speeding up PySpark with Apache Arrow</title><link href="/blog/2017/07/26/spark-arrow/"
rel="alternate" type="text/html" title="Speeding up PySpark with Apache Arrow" /><published>2017-07-26T12:00:00-04:00</published><updated>2017-07-26T12:00:00-04:00</updated><id>/blog/2017/07/26/spark-arrow</id><content
type="html" xml:base="/blog/2017/07/26/spark-arrow/">&lt;!--
 
 --&gt;
 


Mime
View raw message