arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject [1/3] arrow git commit: ARROW-96: Add C++ API documentation
Date Fri, 13 Jan 2017 18:58:51 GMT
Repository: arrow
Updated Branches:
  refs/heads/master ad0e57d23 -> cb83b8d30


http://git-wip-us.apache.org/repos/asf/arrow/blob/cb83b8d3/cpp/apidoc/index.md
----------------------------------------------------------------------
diff --git a/cpp/apidoc/index.md b/cpp/apidoc/index.md
new file mode 100644
index 0000000..080f848
--- /dev/null
+++ b/cpp/apidoc/index.md
@@ -0,0 +1,85 @@
+Apache Arrow C++ API documentation      {#index}
+==================================
+
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+Apache Arrow is a columnar in-memory analytics layer designed to accelerate
+big data. It houses a set of canonical in-memory representations of flat and
+hierarchical data along with multiple language-bindings for structure
+manipulation. It also provides IPC and common algorithm implementations.
+
+This is the documentation of the C++ API of Apache Arrow. For more details
+on the format and other language bindings see
+the [main page for Arrow](https://arrow.apache.org/). Here will we only detail
+the usage of the C++ API for Arrow and the leaf libraries that add additional
+functionality such as using [jemalloc](http://jemalloc.net/) as an allocator
+for Arrow structures.
+
+Getting Started
+---------------
+
+The most basic structure in Arrow is an `arrow::Array`. It holds a sequence
+of values with known length all having the same type. It consists of the data
+itself and an additional bitmap that indicates if the corresponding entry of
+array is a null-value. Note that for array with zero null entries, we can omit
+this bitmap.
+
+As Arrow objects are immutable, there are classes provided that should help you
+build these objects. To build an array of `int64_t` elements, we can use the
+`Int64Builder`. In the following example, we build an array of the range 1 to 8
+where the element that should hold the number 4 is nulled.
+
+    Int64Builder builder(arrow::default_memory_pool(), arrow::int64());
+    builder.Append(1);
+    builder.Append(2);
+    builder.Append(3);
+    builder.AppendNull();
+    builder.Append(5);
+    builder.Append(6);
+    builder.Append(7);
+    builder.Append(8);
+
+    std::shared_ptr<Array> array;
+    builder.Finish(&array);
+
+The resulting Array (which can be casted to `arrow::Int64Array` if you want
+to access its values) then consists of two `arrow::Buffer`. The first one is
+the null bitmap holding a single byte with the bits `0|0|0|0|1|0|0|0`.
+As we use [least-significant bit (LSB) numbering](https://en.wikipedia.org/wiki/Bit_numbering)
+this indicates that the fourth entry in the array is null. The second
+buffer is simply an `int64_t` array containing all the above values.
+As the fourth entry is null, the value at that position in the buffer is
+undefined.
+
+    // Cast the Array to its actual type to access its data
+    std::shared_ptr<Int64Array> int64_array = std::shared_pointer_cast<Int64Array>(array);
+
+    // Get the pointer to the null bitmap.
+    const uint8_t* null_bitmap = int64_array->null_bitmap_data();
+
+    // Get the pointer to the actual data
+    const int64_t* data = int64_array->raw_data();
+
+In the above example, we have yet skipped explaining two things in the code.
+On constructing the builder, we have passed `arrow::int64()` to it. This is
+the type information with which the resulting array will be annotated. In
+this simple form, it is solely a `std::shared_ptr<arrow::Int64Type>`
+instantiation.
+
+Furthermore, we have passed `arrow::default_memory_pool()` to the constructor.
+This `arrow::MemoryPool` is used for the allocations of heap memory. Besides
+tracking the amount of memory allocated, the allocator also ensures that the
+allocated memory regions are 64-byte aligned (as required by the Arrow
+specification).

http://git-wip-us.apache.org/repos/asf/arrow/blob/cb83b8d3/cpp/src/arrow/array.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/array.h b/cpp/src/arrow/array.h
index 57214c4..45f8ab9 100644
--- a/cpp/src/arrow/array.h
+++ b/cpp/src/arrow/array.h
@@ -65,11 +65,12 @@ class ArrayVisitor {
   virtual Status Visit(const DictionaryArray& type) = 0;
 };
 
-// Immutable data array with some logical type and some length. Any memory is
-// owned by the respective Buffer instance (or its parents).
-//
-// The base class is only required to have a null bitmap buffer if the null
-// count is greater than 0
+/// Immutable data array with some logical type and some length.
+///
+/// Any memory is owned by the respective Buffer instance (or its parents).
+///
+/// The base class is only required to have a null bitmap buffer if the null
+/// count is greater than 0
 class ARROW_EXPORT Array {
  public:
   Array(const std::shared_ptr<DataType>& type, int32_t length, int32_t null_count
= 0,
@@ -77,19 +78,28 @@ class ARROW_EXPORT Array {
 
   virtual ~Array() = default;
 
-  // Determine if a slot is null. For inner loops. Does *not* boundscheck
+  /// Determine if a slot is null. For inner loops. Does *not* boundscheck
   bool IsNull(int i) const {
     return null_count_ > 0 && BitUtil::BitNotSet(null_bitmap_data_, i);
   }
 
+  /// Size in the number of elements this array contains.
   int32_t length() const { return length_; }
+
+  /// The number of null entries in the array.
   int32_t null_count() const { return null_count_; }
 
   std::shared_ptr<DataType> type() const { return type_; }
   Type::type type_enum() const { return type_->type; }
 
+  /// Buffer for the null bitmap.
+  ///
+  /// Note that for `null_count == 0`, this can be a `nullptr`.
   std::shared_ptr<Buffer> null_bitmap() const { return null_bitmap_; }
 
+  /// Raw pointer to the null bitmap.
+  ///
+  /// Note that for `null_count == 0`, this can be a `nullptr`.
   const uint8_t* null_bitmap_data() const { return null_bitmap_data_; }
 
   bool BaseEquals(const std::shared_ptr<Array>& arr) const;
@@ -97,13 +107,14 @@ class ARROW_EXPORT Array {
   virtual bool Equals(const std::shared_ptr<Array>& arr) const = 0;
   virtual bool ApproxEquals(const std::shared_ptr<Array>& arr) const;
 
-  // Compare if the range of slots specified are equal for the given array and
-  // this array.  end_idx exclusive.  This methods does not bounds check.
+  /// Compare if the range of slots specified are equal for the given array and
+  /// this array.  end_idx exclusive.  This methods does not bounds check.
   virtual bool RangeEquals(int32_t start_idx, int32_t end_idx, int32_t other_start_idx,
       const std::shared_ptr<Array>& arr) const = 0;
 
-  // Determines if the array is internally consistent.  Defaults to always
-  // returning Status::OK.  This can be an expensive check.
+  /// Determines if the array is internally consistent.
+  ///
+  /// Defaults to always returning Status::OK. This can be an expensive check.
   virtual Status Validate() const;
 
   virtual Status Accept(ArrayVisitor* visitor) const = 0;
@@ -121,7 +132,7 @@ class ARROW_EXPORT Array {
   DISALLOW_COPY_AND_ASSIGN(Array);
 };
 
-// Degenerate null type Array
+/// Degenerate null type Array
 class ARROW_EXPORT NullArray : public Array {
  public:
   using TypeClass = NullType;
@@ -141,7 +152,7 @@ class ARROW_EXPORT NullArray : public Array {
 Status ARROW_EXPORT GetEmptyBitmap(
     MemoryPool* pool, int32_t length, std::shared_ptr<MutableBuffer>* result);
 
-// Base class for fixed-size logical types
+/// Base class for fixed-size logical types
 class ARROW_EXPORT PrimitiveArray : public Array {
  public:
   virtual ~PrimitiveArray() {}


Mime
View raw message