Subject [25/27] arrow-site git commit: Update Python documentation
Date Mon, 08 May 2017 04:53:10 GMT
+  <div class="section" id="in-memory-data-model">
+<span id="data"></span><h1>In-Memory Data Model<a class="headerlink"
href="#in-memory-data-model" title="Permalink to this headline">¶</a></h1>
+<p>Apache Arrow defines columnar array data structures by composing type metadata
+with memory buffers, like the ones explained in the documentation on
+<a class="reference internal" href="memory.html#io"><span class="std std-ref">Memory
and IO</span></a>. These data structures are exposed in Python through
+a series of interrelated classes:</p>
+<ul class="simple">
+<li><strong>Type Metadata</strong>: Instances of <code class="docutils
literal"><span class="pre">pyarrow.DataType</span></code>, which describe
a logical
+array type</li>
+<li><strong>Schemas</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Schema</span></code>, which describe a named
+collection of types. These can be thought of as the column types in a
+table-like object.</li>
+<li><strong>Arrays</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Array</span></code>, which are atomic, contiguous
+columnar data structures composed from Arrow Buffer objects</li>
+<li><strong>Record Batches</strong>: Instances of <code class="docutils
literal"><span class="pre">pyarrow.RecordBatch</span></code>, which are
+collection of Array objects with a particular Schema</li>
+<li><strong>Tables</strong>: Instances of <code class="docutils literal"><span
class="pre">pyarrow.Table</span></code>, a logical table data structure in
+which each column consists of one or more <code class="docutils literal"><span class="pre">pyarrow.Array</span></code>
objects of the
+same type.</li>
+<p>We will examine these in the sections below in a series of examples.</p>
+<div class="section" id="type-metadata">
+<span id="data-types"></span><h2>Type Metadata<a class="headerlink"
href="#type-metadata" title="Permalink to this headline">¶</a></h2>
+<p>Apache Arrow defines language agnostic column-oriented data structures for
+array data. These include:</p>
+<ul class="simple">
+<li><strong>Fixed-length primitive types</strong>: numbers, booleans, date
and times, fixed
+size binary, decimals, and other values that fit into a given number</li>
+<li><strong>Variable-length primitive types</strong>: binary, string</li>
+<li><strong>Nested types</strong>: list, struct, and union</li>
+<li><strong>Dictionary type</strong>: An encoded categorical type (more
on this later)</li>
+<p>Each logical data type in Arrow has a corresponding factory function for
+creating an instance of that type object in Python:</p>
+<div class="highlight-ipython"><div class="highlight"><pre><span></span><span
class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">pyarrow</span>
<span class="kn">as</span> <span class="nn">pa</span>
+<span class="gp">In [2]: </span><span class="n">t1</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">int32</span><span class="p">()</span>
+<span class="gp">In [3]: </span><span class="n">t2</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">string</span><span class="p">()</span>
+<span class="gp">In [4]: </span><span class="n">t3</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">binary</span><span class="p">()</span>
+<span class="gp">In [5]: </span><span class="n">t4</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">binary</span><span class="p">(</span><span class="mi">10</span><span
+<span class="gp">In [6]: </span><span class="n">t5</span> <span
class="o">=</span> <span class="n">pa</span><span class="o">.</span><span
class="n">timestamp</span><span class="p">(</span><span class="s1">&#39;ms&#39;</span><span
+<span class="gp">In [7]: </span><span class="n">t1</span>
+<span class="gh">Out[7]: </span><span class="go">DataType(int32)</span>
+<span class="gp">In [8]: </span><span class="k">print</span><span
class="p">(</span><span class="n">t1</span><span class="p">)</span>
+<span class="go">
View raw message