arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject [25/27] arrow-site git commit: Update Python documentation
Date Mon, 08 May 2017 04:53:10 GMT
diff --git a/docs/python/data.html b/docs/python/data.html
new file mode 100644
index 0000000..e16f145
--- /dev/null
+++ b/docs/python/data.html
@@ -0,0 +1,524 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
+  "">
+<html xmlns="">
+  <head>
+    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+    <title>In-Memory Data Model &#8212; pyarrow  documentation</title>
+    <link rel="stylesheet" href="_static/sphinxdoc.css" type="text/css" />
+    <link rel="stylesheet" href="_static/pygments.css" type="text/css" />
+    <script type="text/javascript">
+        URL_ROOT:    './',
+        VERSION:     '',
+        COLLAPSE_INDEX: false,
+        FILE_SUFFIX: '.html',
+        HAS_SOURCE:  true,
+        SOURCELINK_SUFFIX: '.txt'
+      };
+    </script>
+    <script type="text/javascript" src="_static/jquery.js"></script>
+    <script type="text/javascript" src="_static/underscore.js"></script>
+    <script type="text/javascript" src="_static/doctools.js"></script>
+    <script type="text/javascript" src=""></script>
+    <link rel="index" title="Index" href="genindex.html" />
+    <link rel="search" title="Search" href="search.html" />
+    <link rel="next" title="IPC: Fast Streaming and Serialization" href="ipc.html" />
+    <link rel="prev" title="Memory and IO Interfaces" href="memory.html" /> 
+  </head>
+  <body role="document">
+    <div class="related" role="navigation" aria-label="related navigation">
+      <h3>Navigation</h3>
+      <ul>
+        <li class="right" style="margin-right: 10px">
+          <a href="genindex.html" title="General Index"
+             accesskey="I">index</a></li>
+        <li class="right" >
+          <a href="ipc.html" title="IPC: Fast Streaming and Serialization"
+             accesskey="N">next</a> |</li>
+        <li class="right" >
+          <a href="memory.html" title="Memory and IO Interfaces"
+             accesskey="P">previous</a> |</li>
+        <li class="nav-item nav-item-0"><a href="index.html">pyarrow  documentation</a>
+      </ul>
+    </div>
+      <div class="sphinxsidebar" role="navigation" aria-label="main navigation">
+        <div class="sphinxsidebarwrapper">
+  <h3><a href="index.html">Table Of Contents</a></h3>
+  <ul>
+<li><a class="reference internal" href="#">In-Memory Data Model</a><ul>
+<li><a class="reference internal" href="#type-metadata">Type Metadata</a></li>
+<li><a class="reference internal" href="#schemas">Schemas</a></li>
+<li><a class="reference internal" href="#arrays">Arrays</a><ul>
+<li><a class="reference internal" href="#dictionary-arrays">Dictionary Arrays</a></li>
+<li><a class="reference internal" href="#record-batches">Record Batches</a></li>
+<li><a class="reference internal" href="#tables">Tables</a></li>
+<li><a class="reference internal" href="#custom-schema-and-field-metadata">Custom
Schema and Field Metadata</a></li>
+  <h4>Previous topic</h4>
+  <p class="topless"><a href="memory.html"
+                        title="previous chapter">Memory and IO Interfaces</a></p>
+  <h4>Next topic</h4>
+  <p class="topless"><a href="ipc.html"
+                        title="next chapter">IPC: Fast Streaming and Serialization</a></p>
+  <div role="note" aria-label="source link">
+    <h3>This Page</h3>
+    <ul class="this-page-menu">
+      <li><a href="_sources/data.rst.txt"
+            rel="nofollow">Show Source</a></li>
+    </ul>
+   </div>
+<div id="searchbox" style="display: none" role="search">
+  <h3>Quick search</h3>
+    <form class="search" action="search.html" method="get">
+      <div><input type="text" name="q" /></div>
+      <div><input type="submit" value="Go" /></div>
+      <input type="hidden" name="check_keywords" value="yes" />
+      <input type="hidden" name="area" value="default" />
+    </form>
+<script type="text/javascript">$('#searchbox').show(0);</script>
+        </div>
+      </div>
+    <div class="document">
+      <div class="documentwrapper">
+        <div class="bodywrapper">
+          <div class="body" role="main">
+  <div class="section" id="in-memory-data-model">
+<span id="data"></span><h1>In-Memory Data Model<a class="headerlink"
href="#in-memory-data-model" title="Permalink to this headline">¶</a></h1>
+<p>Apache Arrow defines columnar array data structures by composing type metadata
+with memory buffers, like the ones explained in the documentation on
+<a class="reference internal" href="memory.html#io"><span class="std std-ref">Memory
and IO</span></a>. These data structures are exposed in Python through
+a series of interrelated classes:</p>
+<ul class="simple">
+<li><strong>Type Metadata</strong>: Instances of <code class="docutils
literal"><span class="pre">pyarrow.DataType</span></code>, which describe
a logical
+array type</li>
+<li><strong>Schemas</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Schema</span></code>, which describe a named
+collection of types. These can be thought of as the column types in a
+table-like object.</li>
+<li><strong>Arrays</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Array</span></code>, which are atomic, contiguous
+columnar data structures composed from Arrow Buffer objects</li>
+<li><strong>Record Batches</strong>: Instances of <code class="docutils
literal"><span class="pre">pyarrow.RecordBatch</span></code>, which are
+collection of Array objects with a particular Schema</li>
+<li><strong>Tables</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Table</span></code>, a logical table data structure in
+which each column consists of one or more <code class="docutils literal"><span class="pre">pyarrow.Array</span></code>
objects of the
+same type.</li>
+<p>We will examine these in the sections below in a series of examples.</p>
+<div class="section" id="type-metadata">
+<span id="data-types"></span><h2>Type Metadata<a class="headerlink"
href="#type-metadata" title="Permalink to this headline">¶</a></h2>
+<p>Apache Arrow defines language agnostic column-oriented data structures for
+array data. These include:</p>
+<ul class="simple">
+<li><strong>Fixed-length primitive types</strong>: numbers, booleans, date
and times, fixed
+size binary, decimals, and other values that fit into a given number</li>
+<li><strong>Variable-length primitive types</strong>: binary, string</li>
+<li><strong>Nested types</strong>: list, struct, and union</li>
+<li><strong>Dictionary type</strong>: An encoded categorical type (more
on this later)</li>
+<p>Each logical data type in Arrow has a corresponding factory function for
+creating an instance of that type object in Python:</p>
+<div class="highlight-ipython"><div class="highlight"><pre><span></span><span
class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span>
<span class="kn">as</span> <span class="nn">pa</span>
+<span class="gp">In [2]: </span><span class="n">t1</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">int32</span><span class="p">()</span>
+<span class="gp">In [3]: </span><span class="n">t2</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">string</span><span class="p">()</span>
+<span class="gp">In [4]: </span><span class="n">t3</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">binary</span><span class="p">()</span>
+<span class="gp">In [5]: </span><span class="n">t4</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">binary</span><span class="p">(</span><span class="mi">10</span><span
+<span class="gp">In [6]: </span><span class="n">t5</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">timestamp</span><span class="p">(</span><span class="s1">&#39;ms&#39;</span><span
+<span class="gp">In [7]: </span><span class="n">t1</span>
+<span class="gh">Out[7]: </span><span class="go">DataType(int32)</span>
+<span class="gp">In [8]: </span><span class="k">print</span><span
class="p">(</span><span class="n">t1</span><span class="p">)</span>
+<span class="go">
View raw message