arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject [3/3] arrow-site git commit: Add Plasma blog post
Date Tue, 08 Aug 2017 14:25:56 GMT
Add Plasma blog post


Project: http://git-wip-us.apache.org/repos/asf/arrow-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow-site/commit/3b67853c
Tree: http://git-wip-us.apache.org/repos/asf/arrow-site/tree/3b67853c
Diff: http://git-wip-us.apache.org/repos/asf/arrow-site/diff/3b67853c

Branch: refs/heads/asf-site
Commit: 3b67853c506b90b862ddd9f1aecf69b2e609fa7c
Parents: b286da8
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Tue Aug 8 10:25:46 2017 -0400
Committer: Wes McKinney <wes.mckinney@twosigma.com>
Committed: Tue Aug 8 10:25:46 2017 -0400

----------------------------------------------------------------------
 .../08/plasma-in-memory-object-store/index.html | 261 +++++++++++++++++++
 blog/index.html                                 | 153 +++++++++++
 css/main.css                                    |   2 +-
 docs/ipc.html                                   |  27 +-
 docs/memory_layout.html                         |  49 ++--
 docs/metadata.html                              |  27 +-
 feed.xml                                        | 123 ++++++++-
 7 files changed, 599 insertions(+), 43 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/blog/2017/08/08/plasma-in-memory-object-store/index.html
----------------------------------------------------------------------
diff --git a/blog/2017/08/08/plasma-in-memory-object-store/index.html b/blog/2017/08/08/plasma-in-memory-object-store/index.html
new file mode 100644
index 0000000..3fd78fa
--- /dev/null
+++ b/blog/2017/08/08/plasma-in-memory-object-store/index.html
@@ -0,0 +1,261 @@
+<!DOCTYPE html>
+<html lang="en-US">
+  <head>
+    <meta charset="UTF-8">
+    <title>Plasma In-Memory Object Store</title>
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1">
+    <meta name="generator" content="Jekyll v3.4.3">
+    <!-- The above 3 meta tags *must* come first in the head; any other head content must
come *after* these tags -->
+    <link rel="icon" type="image/x-icon" href="/favicon.ico">
+
+    <title>Apache Arrow Homepage</title>
+    <link rel="stylesheet" href="//fonts.googleapis.com/css?family=Lato:300,300italic,400,400italic,700,700italic,900">
+
+    <link href="/css/main.css" rel="stylesheet">
+    <link href="/css/syntax.css" rel="stylesheet">
+    <script src="https://code.jquery.com/jquery-3.2.1.min.js"
+            integrity="sha256-hwg4gsxgFZhOsEEamdOYGBf13FyQuiTwlAQgxVSNgt4="
+            crossorigin="anonymous"></script>
+    <script src="/assets/javascripts/bootstrap.min.js"></script>
+  </head>
+
+
+
+<body class="wrap">
+  <div class="container">
+    <nav class="navbar navbar-default">
+  <div class="container-fluid">
+    <div class="navbar-header">
+      <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#arrow-navbar">
+        <span class="sr-only">Toggle navigation</span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+        <span class="icon-bar"></span>
+      </button>
+      <a class="navbar-brand" href="/">Apache Arrow&#8482;&nbsp;&nbsp;&nbsp;</a>
+    </div>
+
+    <!-- Collect the nav links, forms, and other content for toggling -->
+    <div class="collapse navbar-collapse" id="arrow-navbar">
+      <ul class="nav navbar-nav">
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Project Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/install/">Install</a></li>
+            <li><a href="/blog/">Blog</a></li>
+            <li><a href="/release/">Releases</a></li>
+            <li><a href="https://issues.apache.org/jira/browse/ARROW">Issue Tracker</a></li>
+            <li><a href="https://github.com/apache/arrow">Source Code</a></li>
+            <li><a href="http://mail-archives.apache.org/mod_mbox/arrow-dev/">Mailing
List</a></li>
+            <li><a href="https://apachearrowslackin.herokuapp.com">Slack Channel</a></li>
+            <li><a href="/committers/">Committers</a></li>
+          </ul>
+        </li>
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Specification<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/memory_layout.html">Memory Layout</a></li>
+            <li><a href="/docs/metadata.html">Metadata</a></li>
+            <li><a href="/docs/ipc.html">Messaging / IPC</a></li>
+          </ul>
+        </li>
+
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">Documentation<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="/docs/python">Python</a></li>
+            <li><a href="/docs/cpp">C++ API</a></li>
+            <li><a href="/docs/java">Java API</a></li>
+            <li><a href="/docs/c_glib">C GLib API</a></li>
+          </ul>
+        </li>
+        <!-- <li><a href="/blog">Blog</a></li> -->
+        <li class="dropdown">
+          <a href="#" class="dropdown-toggle" data-toggle="dropdown"
+             role="button" aria-haspopup="true"
+             aria-expanded="false">ASF Links<span class="caret"></span>
+          </a>
+          <ul class="dropdown-menu">
+            <li><a href="http://www.apache.org/">ASF Website</a></li>
+            <li><a href="http://www.apache.org/licenses/">License</a></li>
+            <li><a href="http://www.apache.org/foundation/sponsorship.html">Donate</a></li>
+            <li><a href="http://www.apache.org/foundation/thanks.html">Thanks</a></li>
+            <li><a href="http://www.apache.org/security/">Security</a></li>
+          </ul>
+        </li>
+      </ul>
+      <a href="http://www.apache.org/">
+        <img style="float:right;" src="/img/asf_logo.svg" width="120px"/>
+      </a>
+      </div><!-- /.navbar-collapse -->
+    </div>
+  </nav>
+
+
+    <h2>
+      Plasma In-Memory Object Store
+      <a href="/blog/2017/08/08/plasma-in-memory-object-store/" class="permalink" title="Permalink">∞</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            08 Aug 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href="http://people.apache.org/~Philipp Moritz and Robert Nishihara"><i
class="fa fa-user"></i>  (Philipp Moritz and Robert Nishihara)</a>
+        </div>
+      </div>
+    </div>
+
+    <!--
+
+-->
+
+<p><em><a href="https://people.eecs.berkeley.edu/~pcmoritz/">Philipp Moritz</a>
and <a href="http://www.robertnishihara.com">Robert Nishihara</a> are graduate
students at UC
+ Berkeley.</em></p>
+
+<h2 id="plasma-a-high-performance-shared-memory-object-store">Plasma: A High-Performance
Shared-Memory Object Store</h2>
+
+<h3 id="motivating-plasma">Motivating Plasma</h3>
+
+<p>This blog post presents Plasma, an in-memory object store that is being
+developed as part of Apache Arrow. <strong>Plasma holds immutable objects in shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.</strong> In light of the trend toward larger and larger multicore machines,
+Plasma enables critical performance optimizations in the big data regime.</p>
+
+<p>Plasma was initially developed as part of <a href="https://github.com/ray-project/ray">Ray</a>,
and has recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.</p>
+
+<p>One of the goals of Apache Arrow is to serve as a common data layer enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.</p>
+
+<p><strong>Expensive serialization and deserialization as well as data copying
are a
+common performance bottleneck in distributed computing.</strong> For example, a
+Python-based execution framework that wishes to distribute computation across
+multiple Python “worker” processes and then aggregate the results in a single
+“driver” process may choose to serialize data using the built-in <code class="highlighter-rouge">pickle</code>
+library. Assuming one Python process per core, each worker process would have to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.</p>
+
+<p>Using Plasma plus Arrow, the data being operated on would be placed in the
+Plasma store once, and all of the workers would read the data without copying or
+deserializing it (the workers would map the relevant region of memory into their
+own address spaces). The workers would then put the results of their computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.</p>
+
+<h3 id="the-plasma-api">The Plasma API:</h3>
+
+<p>Below we illustrate a subset of the API. The C++ API is documented more fully
+<a href="https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md">here</a>,
and the Python API is documented <a href="https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst">here</a>.</p>
+
+<p><strong>Object IDs:</strong> Each object is associated with a string
of bytes.</p>
+
+<p><strong>Creating an object:</strong> Objects are stored in Plasma in
two stages. First, the
+object store <em>creates</em> the object by allocating a buffer for it. At this
point,
+the client can write to the buffer and construct the object within the allocated
+buffer. When the client is done, the client <em>seals</em> the buffer making
the object
+immutable and making it available to other Plasma clients.</p>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span
class="c"># Create an object.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span
class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span
class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span
class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span
class="s">'a'</span><span class="p">)</span>
+<span class="n">object_size</span> <span class="o">=</span> <span
class="mi">1000</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span
class="n">memoryview</span><span class="p">(</span><span class="n">client</span><span
class="o">.</span><span class="n">create</span><span class="p">(</span><span
class="n">object_id</span><span class="p">,</span> <span class="n">object_size</span><span
class="p">))</span>
+
+<span class="c"># Write to the buffer.</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span>
<span class="nb">range</span><span class="p">(</span><span class="mi">1000</span><span
class="p">):</span>
+    <span class="nb">buffer</span><span class="p">[</span><span
class="n">i</span><span class="p">]</span> <span class="o">=</span>
<span class="mi">0</span>
+
+<span class="c"># Seal the object making it immutable and available to other clients.</span>
+<span class="n">client</span><span class="o">.</span><span class="n">seal</span><span
class="p">(</span><span class="n">object_id</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p><strong>Getting an object:</strong> After an object has been sealed,
any client who knows the
+object ID can get the object.</p>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span
class="c"># Get the object from the store. This blocks until the object has been sealed.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span
class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span
class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span
class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span
class="s">'a'</span><span class="p">)</span>
+<span class="p">[</span><span class="n">buff</span><span class="p">]</span>
<span class="o">=</span> <span class="n">client</span><span class="o">.</span><span
class="n">get</span><span class="p">([</span><span class="n">object_id</span><span
class="p">])</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span
class="n">memoryview</span><span class="p">(</span><span class="n">buff</span><span
class="p">)</span>
+</code></pre>
+</div>
+
+<p>If the object has not been sealed yet, then the call to <code class="highlighter-rouge">client.get</code>
will block
+until the object has been sealed.</p>
+
+<h3 id="a-sorting-application">A sorting application</h3>
+
+<p>To illustrate the benefits of Plasma, we demonstrate an <strong>11x speedup</strong>
(on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.</p>
+
+<ul>
+  <li>We assume that the data is partitioned across K pandas DataFrames and that
+each one already lives in the Plasma store.</li>
+  <li>We subsample the data, sort the subsampled data, and use the result to define
+L non-overlapping buckets.</li>
+  <li>For each of the K data partitions and each of the L buckets, we find the
+subset of the data partition that falls in the bucket, and we sort that
+subset.</li>
+  <li>For each of the L buckets, we gather all of the K sorted subsets that fall in
+that bucket.</li>
+  <li>For each of the L buckets, we merge the corresponding K sorted subsets.</li>
+  <li>We turn each bucket into a pandas DataFrame and place it in the Plasma store.</li>
+</ul>
+
+<p>Using this scheme, we can sort the DataFrame (the data starts and ends in the
+Plasma store), in 44 seconds, giving an 11x speedup over the baseline.</p>
+
+<h3 id="design">Design</h3>
+
+<p>The Plasma store runs as a separate process. It is written in C++ and is
+designed as a single-threaded event loop based on the <a href="https://redis.io/">Redis</a>
event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using <a href="https://google.github.io/flatbuffers/">Google
Flatbuffers</a>.</p>
+
+<h3 id="call-for-contributions">Call for contributions</h3>
+
+<p>Plasma is a work in progress, and the API is currently unstable. Today Plasma is
+primarily used in <a href="https://github.com/ray-project/ray">Ray</a> as an
in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasma’s API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the project.</p>
+
+
+
+    <hr/>
+<footer class="footer">
+  <p>Apache Arrow, Arrow, Apache, the Apache feather logo, and the Apache Arrow project
logo are either registered trademarks or trademarks of The Apache Software Foundation in the
United States and other countries.</p>
+  <p>&copy; 2017 Apache Software Foundation</p>
+</footer>
+
+  </div>
+</body>
+</html>

http://git-wip-us.apache.org/repos/asf/arrow-site/blob/3b67853c/blog/index.html
----------------------------------------------------------------------
diff --git a/blog/index.html b/blog/index.html
index 1350659..bf83de3 100644
--- a/blog/index.html
+++ b/blog/index.html
@@ -111,6 +111,159 @@
     
   <div class="container">
     <h2>
+      Plasma In-Memory Object Store
+      <a href="/blog/2017/08/08/plasma-in-memory-object-store/" class="permalink" title="Permalink">∞</a>
+    </h2>
+
+    
+
+    <div class="panel">
+      <div class="panel-body">
+        <div>
+          <span class="label label-default">Published</span>
+          <span class="published">
+            <i class="fa fa-calendar"></i>
+            08 Aug 2017
+          </span>
+        </div>
+        <div>
+          <span class="label label-default">By</span>
+          <a href=""><i class="fa fa-user"></i>  (Philipp Moritz and Robert
Nishihara)</a>
+        </div>
+      </div>
+    </div>
+    <!--
+
+-->
+
+<p><em><a href="https://people.eecs.berkeley.edu/~pcmoritz/">Philipp Moritz</a>
and <a href="http://www.robertnishihara.com">Robert Nishihara</a> are graduate
students at UC
+ Berkeley.</em></p>
+
+<h2 id="plasma-a-high-performance-shared-memory-object-store">Plasma: A High-Performance
Shared-Memory Object Store</h2>
+
+<h3 id="motivating-plasma">Motivating Plasma</h3>
+
+<p>This blog post presents Plasma, an in-memory object store that is being
+developed as part of Apache Arrow. <strong>Plasma holds immutable objects in shared
+memory so that they can be accessed efficiently by many clients across process
+boundaries.</strong> In light of the trend toward larger and larger multicore machines,
+Plasma enables critical performance optimizations in the big data regime.</p>
+
+<p>Plasma was initially developed as part of <a href="https://github.com/ray-project/ray">Ray</a>,
and has recently been moved
+to Apache Arrow in the hopes that it will be broadly useful.</p>
+
+<p>One of the goals of Apache Arrow is to serve as a common data layer enabling
+zero-copy data exchange between multiple frameworks. A key component of this
+vision is the use of off-heap memory management (via Plasma) for storing and
+sharing Arrow-serialized objects between applications.</p>
+
+<p><strong>Expensive serialization and deserialization as well as data copying
are a
+common performance bottleneck in distributed computing.</strong> For example, a
+Python-based execution framework that wishes to distribute computation across
+multiple Python “worker” processes and then aggregate the results in a single
+“driver” process may choose to serialize data using the built-in <code class="highlighter-rouge">pickle</code>
+library. Assuming one Python process per core, each worker process would have to
+copy and deserialize the data, resulting in excessive memory usage. The driver
+process would then have to deserialize results from each of the workers,
+resulting in a bottleneck.</p>
+
+<p>Using Plasma plus Arrow, the data being operated on would be placed in the
+Plasma store once, and all of the workers would read the data without copying or
+deserializing it (the workers would map the relevant region of memory into their
+own address spaces). The workers would then put the results of their computation
+back into the Plasma store, which the driver could then read and aggregate
+without copying or deserializing the data.</p>
+
+<h3 id="the-plasma-api">The Plasma API:</h3>
+
+<p>Below we illustrate a subset of the API. The C++ API is documented more fully
+<a href="https://github.com/apache/arrow/blob/master/cpp/apidoc/tutorials/plasma.md">here</a>,
and the Python API is documented <a href="https://github.com/apache/arrow/blob/master/python/doc/source/plasma.rst">here</a>.</p>
+
+<p><strong>Object IDs:</strong> Each object is associated with a string
of bytes.</p>
+
+<p><strong>Creating an object:</strong> Objects are stored in Plasma in
two stages. First, the
+object store <em>creates</em> the object by allocating a buffer for it. At this
point,
+the client can write to the buffer and construct the object within the allocated
+buffer. When the client is done, the client <em>seals</em> the buffer making
the object
+immutable and making it available to other Plasma clients.</p>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span
class="c"># Create an object.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span
class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span
class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span
class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span
class="s">'a'</span><span class="p">)</span>
+<span class="n">object_size</span> <span class="o">=</span> <span
class="mi">1000</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span
class="n">memoryview</span><span class="p">(</span><span class="n">client</span><span
class="o">.</span><span class="n">create</span><span class="p">(</span><span
class="n">object_id</span><span class="p">,</span> <span class="n">object_size</span><span
class="p">))</span>
+
+<span class="c"># Write to the buffer.</span>
+<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span>
<span class="nb">range</span><span class="p">(</span><span class="mi">1000</span><span
class="p">):</span>
+    <span class="nb">buffer</span><span class="p">[</span><span
class="n">i</span><span class="p">]</span> <span class="o">=</span>
<span class="mi">0</span>
+
+<span class="c"># Seal the object making it immutable and available to other clients.</span>
+<span class="n">client</span><span class="o">.</span><span class="n">seal</span><span
class="p">(</span><span class="n">object_id</span><span class="p">)</span>
+</code></pre>
+</div>
+
+<p><strong>Getting an object:</strong> After an object has been sealed,
any client who knows the
+object ID can get the object.</p>
+
+<div class="language-python highlighter-rouge"><pre class="highlight"><code><span
class="c"># Get the object from the store. This blocks until the object has been sealed.</span>
+<span class="n">object_id</span> <span class="o">=</span> <span
class="n">pyarrow</span><span class="o">.</span><span class="n">plasma</span><span
class="o">.</span><span class="n">ObjectID</span><span class="p">(</span><span
class="mi">20</span> <span class="o">*</span> <span class="n">b</span><span
class="s">'a'</span><span class="p">)</span>
+<span class="p">[</span><span class="n">buff</span><span class="p">]</span>
<span class="o">=</span> <span class="n">client</span><span class="o">.</span><span
class="n">get</span><span class="p">([</span><span class="n">object_id</span><span
class="p">])</span>
+<span class="nb">buffer</span> <span class="o">=</span> <span
class="n">memoryview</span><span class="p">(</span><span class="n">buff</span><span
class="p">)</span>
+</code></pre>
+</div>
+
+<p>If the object has not been sealed yet, then the call to <code class="highlighter-rouge">client.get</code>
will block
+until the object has been sealed.</p>
+
+<h3 id="a-sorting-application">A sorting application</h3>
+
+<p>To illustrate the benefits of Plasma, we demonstrate an <strong>11x speedup</strong>
(on a
+machine with 20 physical cores) for sorting a large pandas DataFrame (one
+billion entries). The baseline is the built-in pandas sort function, which sorts
+the DataFrame in 477 seconds. To leverage multiple cores, we implement the
+following standard distributed sorting scheme.</p>
+
+<ul>
+  <li>We assume that the data is partitioned across K pandas DataFrames and that
+each one already lives in the Plasma store.</li>
+  <li>We subsample the data, sort the subsampled data, and use the result to define
+L non-overlapping buckets.</li>
+  <li>For each of the K data partitions and each of the L buckets, we find the
+subset of the data partition that falls in the bucket, and we sort that
+subset.</li>
+  <li>For each of the L buckets, we gather all of the K sorted subsets that fall in
+that bucket.</li>
+  <li>For each of the L buckets, we merge the corresponding K sorted subsets.</li>
+  <li>We turn each bucket into a pandas DataFrame and place it in the Plasma store.</li>
+</ul>
+
+<p>Using this scheme, we can sort the DataFrame (the data starts and ends in the
+Plasma store), in 44 seconds, giving an 11x speedup over the baseline.</p>
+
+<h3 id="design">Design</h3>
+
+<p>The Plasma store runs as a separate process. It is written in C++ and is
+designed as a single-threaded event loop based on the <a href="https://redis.io/">Redis</a>
event loop library.
+The plasma client library can be linked into applications. Clients communicate
+with the Plasma store via messages serialized using <a href="https://google.github.io/flatbuffers/">Google
Flatbuffers</a>.</p>
+
+<h3 id="call-for-contributions">Call for contributions</h3>
+
+<p>Plasma is a work in progress, and the API is currently unstable. Today Plasma is
+primarily used in <a href="https://github.com/ray-project/ray">Ray</a> as an
in-memory cache for Arrow serialized objects.
+We are looking for a broader set of use cases to help refine Plasma’s API. In
+addition, we are looking for contributions in a variety of areas including
+improving performance and building other language bindings. Please let us know
+if you are interested in getting involved with the project.</p>
+
+
+  </div>
+
+  
+
+  
+    
+  <div class="container">
+    <h2>
       Speeding up PySpark with Apache Arrow
       <a href="/blog/2017/07/26/spark-arrow/" class="permalink" title="Permalink">∞</a>
     </h2>


Mime
View raw message