arrow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject arrow git commit: ARROW-1156: [C++/Python] Expand casting API, add UnaryKernel callable. Use Cast in appropriate places when converting from pandas
Date Fri, 08 Sep 2017 14:09:43 GMT
Repository: arrow
Updated Branches:
  refs/heads/master fe45c2bb3 -> de2edc8d5


ARROW-1156: [C++/Python] Expand casting API, add UnaryKernel callable. Use Cast in appropriate places when converting from pandas

cc @cloud-fan

With this patch we now try to cast to indicated type on ingest of objects from pandas:

```
In [3]: arr = np.array([None] * 5)

In [4]: pa.Array.from_pandas(arr)
Out[4]:
<pyarrow.lib.NullArray object at 0x7f6cf1485d18>
[
  NA,
  NA,
  NA,
  NA,
  NA
]

In [5]: pa.Array.from_pandas(arr, type=pa.int32())
Out[5]:
<pyarrow.lib.Int32Array object at 0x7f6cf1485d68>
[
  NA,
  NA,
  NA,
  NA,
  NA
]
```

I also added zero-copy casts from integers of the right size to each of the date and time types.

Includes refactoring for ARROW-1481.

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes #1063 from wesm/ARROW-1156 and squashes the following commits:

166d1a50 [Wes McKinney] iwyu
34f5c9d1 [Wes McKinney] Harden default cast options, fix unsafe Python case
1d07b756 [Wes McKinney] Add some basic casting unit tests in Python
c1b45709 [Wes McKinney] Expose arrow::compute::Cast in Python as Array.cast. Still need to write tests
a9a04c9c [Wes McKinney] UnaryKernel::Call returns Status for now for simplicity. Support pre-allocated memory
8903709b [Wes McKinney] Implement casts from null to numbers. Try to cast for types where we do not have an inference rule when converting from arrays of Python objects
a22dd20a [Wes McKinney] Add test to assert zero copy for compatible integer to date/time
a14b83f7 [Wes McKinney] Create callable CastKernel object. Add zero-copy cast rules for date/time types


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/de2edc8d
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/de2edc8d
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/de2edc8d

Branch: refs/heads/master
Commit: de2edc8d591ac8e889495392f8ed20d1d7814dea
Parents: fe45c2b
Author: Wes McKinney <wes.mckinney@twosigma.com>
Authored: Fri Sep 8 10:09:38 2017 -0400
Committer: Wes McKinney <wes.mckinney@twosigma.com>
Committed: Fri Sep 8 10:09:38 2017 -0400

----------------------------------------------------------------------
 cpp/README.md                                  |   3 +
 cpp/build-support/iwyu/mappings/arrow-misc.imp |   5 +-
 cpp/src/arrow/array.h                          |  28 +-
 cpp/src/arrow/compute/CMakeLists.txt           |   2 +
 cpp/src/arrow/compute/api.h                    |  25 ++
 cpp/src/arrow/compute/cast.cc                  | 358 +++++++++++++-------
 cpp/src/arrow/compute/cast.h                   |  14 +-
 cpp/src/arrow/compute/compute-test.cc          |  96 +++++-
 cpp/src/arrow/compute/context.h                |   3 +-
 cpp/src/arrow/compute/kernel.h                 |  49 +++
 cpp/src/arrow/python/numpy_convert.h           |   3 -
 cpp/src/arrow/python/pandas_to_arrow.cc        | 172 ++++++----
 python/pyarrow/array.pxi                       |  41 +++
 python/pyarrow/includes/libarrow.pxd           |  18 +-
 python/pyarrow/tests/test_array.py             |  67 ++++
 python/pyarrow/tests/test_convert_pandas.py    |  20 ++
 16 files changed, 687 insertions(+), 217 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/README.md
----------------------------------------------------------------------
diff --git a/cpp/README.md b/cpp/README.md
index 993efb8..4a51507 100644
--- a/cpp/README.md
+++ b/cpp/README.md
@@ -126,6 +126,9 @@ interfaces in this library as needed.
 The CUDA toolchain used to build the library can be customized by using the
 `$CUDA_HOME` environment variable.
 
+This library is still in Alpha stages, and subject to API changes without
+deprecation warnings.
+
 ### API documentation
 
 To generate the (html) API documentation, run the following command in the apidoc

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/build-support/iwyu/mappings/arrow-misc.imp
----------------------------------------------------------------------
diff --git a/cpp/build-support/iwyu/mappings/arrow-misc.imp b/cpp/build-support/iwyu/mappings/arrow-misc.imp
index 7d9f09c..f39650d 100644
--- a/cpp/build-support/iwyu/mappings/arrow-misc.imp
+++ b/cpp/build-support/iwyu/mappings/arrow-misc.imp
@@ -24,9 +24,10 @@
   { symbol: ["uint8_t", private, "<cstdint>", public ] },
   { symbol: ["_Node_const_iterator", private, "<flatbuffers/flatbuffers.h>", public ] },
   { symbol: ["unordered_map<>::mapped_type", private, "<flatbuffers/flatbuffers.h>", public ] },
-  { symbol: ["__alloc_traits<>::value_type", private, "<vector>", public ] },
   { symbol: ["move", private, "<utility>", public ] },
   { symbol: ["pair", private, "<utility>", public ] },
   { symbol: ["errno", private, "<cerrno>", public ] },
-  { symbol: ["posix_memalign", private, "<cstdlib>", public ] }
+  { symbol: ["posix_memalign", private, "<cstdlib>", public ] },
+  { include: ["<ext/alloc_traits.h>", private, "<vector>", public ] },
+  { include: ["<string.h>", private, "<cstring>", public ] }
 ]

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/array.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/array.h b/cpp/src/arrow/array.h
index 61ab2ef..3faff71 100644
--- a/cpp/src/arrow/array.h
+++ b/cpp/src/arrow/array.h
@@ -88,47 +88,47 @@ struct ARROW_EXPORT ArrayData {
   ArrayData() {}
 
   ArrayData(const std::shared_ptr<DataType>& type, int64_t length,
+            int64_t null_count = kUnknownNullCount, int64_t offset = 0)
+      : type(type), length(length), null_count(null_count), offset(offset) {}
+
+  ArrayData(const std::shared_ptr<DataType>& type, int64_t length,
             const std::vector<std::shared_ptr<Buffer>>& buffers,
             int64_t null_count = kUnknownNullCount, int64_t offset = 0)
-      : type(type),
-        length(length),
-        buffers(buffers),
-        null_count(null_count),
-        offset(offset) {}
+      : ArrayData(type, length, null_count, offset) {
+    this->buffers = buffers;
+  }
 
   ArrayData(const std::shared_ptr<DataType>& type, int64_t length,
             std::vector<std::shared_ptr<Buffer>>&& buffers,
             int64_t null_count = kUnknownNullCount, int64_t offset = 0)
-      : type(type),
-        length(length),
-        buffers(std::move(buffers)),
-        null_count(null_count),
-        offset(offset) {}
+      : ArrayData(type, length, null_count, offset) {
+    this->buffers = std::move(buffers);
+  }
 
   // Move constructor
   ArrayData(ArrayData&& other) noexcept
       : type(std::move(other.type)),
         length(other.length),
-        buffers(std::move(other.buffers)),
         null_count(other.null_count),
         offset(other.offset),
+        buffers(std::move(other.buffers)),
         child_data(std::move(other.child_data)) {}
 
   ArrayData(const ArrayData& other) noexcept
       : type(other.type),
         length(other.length),
-        buffers(other.buffers),
         null_count(other.null_count),
         offset(other.offset),
+        buffers(other.buffers),
         child_data(other.child_data) {}
 
   // Move assignment
   ArrayData& operator=(ArrayData&& other) {
     type = std::move(other.type);
     length = other.length;
-    buffers = std::move(other.buffers);
     null_count = other.null_count;
     offset = other.offset;
+    buffers = std::move(other.buffers);
     child_data = std::move(other.child_data);
     return *this;
   }
@@ -139,9 +139,9 @@ struct ARROW_EXPORT ArrayData {
 
   std::shared_ptr<DataType> type;
   int64_t length;
-  std::vector<std::shared_ptr<Buffer>> buffers;
   int64_t null_count;
   int64_t offset;
+  std::vector<std::shared_ptr<Buffer>> buffers;
   std::vector<std::shared_ptr<ArrayData>> child_data;
 };
 

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/CMakeLists.txt
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/CMakeLists.txt b/cpp/src/arrow/compute/CMakeLists.txt
index a154c47..fa475ca 100644
--- a/cpp/src/arrow/compute/CMakeLists.txt
+++ b/cpp/src/arrow/compute/CMakeLists.txt
@@ -17,8 +17,10 @@
 
 # Headers: top level
 install(FILES
+  api.h
   cast.h
   context.h
+  kernel.h
   DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}/arrow/compute")
 
 #######################################

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/api.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/api.h b/cpp/src/arrow/compute/api.h
new file mode 100644
index 0000000..da7df1c
--- /dev/null
+++ b/cpp/src/arrow/compute/api.h
@@ -0,0 +1,25 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_COMPUTE_API_H
+#define ARROW_COMPUTE_API_H
+
+#include "arrow/compute/cast.h"
+#include "arrow/compute/context.h"
+#include "arrow/compute/kernel.h"
+
+#endif  // ARROW_COMPUTE_API_H

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/cast.cc
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/cast.cc b/cpp/src/arrow/compute/cast.cc
index f610f6b..3885fdf 100644
--- a/cpp/src/arrow/compute/cast.cc
+++ b/cpp/src/arrow/compute/cast.cc
@@ -18,40 +18,66 @@
 #include "arrow/compute/cast.h"
 
 #include <cstdint>
+#include <cstring>
 #include <functional>
 #include <limits>
 #include <memory>
 #include <sstream>
+#include <string>
 #include <type_traits>
 
+#include "arrow/array.h"
+#include "arrow/buffer.h"
+#include "arrow/type.h"
 #include "arrow/type_traits.h"
+#include "arrow/util/bit-util.h"
 #include "arrow/util/logging.h"
+#include "arrow/util/macros.h"
 
 #include "arrow/compute/context.h"
+#include "arrow/compute/kernel.h"
 
 namespace arrow {
 namespace compute {
 
-struct CastContext {
-  FunctionContext* func_ctx;
-  CastOptions options;
+// ----------------------------------------------------------------------
+// Zero copy casts
+
+template <typename O, typename I, typename Enable = void>
+struct is_zero_copy_cast {
+  static constexpr bool value = false;
 };
 
-typedef std::function<void(CastContext*, const ArrayData&, ArrayData*)> CastFunction;
+template <typename O, typename I>
+struct is_zero_copy_cast<O, I, typename std::enable_if<std::is_same<I, O>::value>::type> {
+  static constexpr bool value = true;
+};
+
+// From integers to date/time types with zero copy
+template <typename O, typename I>
+struct is_zero_copy_cast<
+    O, I, typename std::enable_if<std::is_base_of<Integer, I>::value &&
+                                  (std::is_base_of<TimeType, O>::value ||
+                                   std::is_base_of<DateType, O>::value ||
+                                   std::is_base_of<TimestampType, O>::value)>::type> {
+  using O_T = typename O::c_type;
+  using I_T = typename I::c_type;
+
+  static constexpr bool value = sizeof(O_T) == sizeof(I_T);
+};
 
 template <typename OutType, typename InType, typename Enable = void>
 struct CastFunctor {};
 
-// Type is the same, no computation required
+// Indicated no computation required
 template <typename O, typename I>
-struct CastFunctor<O, I, typename std::enable_if<std::is_same<I, O>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
-    output->type = input.type;
-    output->buffers = input.buffers;
-    output->length = input.length;
-    output->offset = input.offset;
-    output->null_count = input.null_count;
-    output->child_data = input.child_data;
+struct CastFunctor<O, I, typename std::enable_if<is_zero_copy_cast<O, I>::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
+    auto in_data = input.data();
+    output->null_count = input.null_count();
+    output->buffers = in_data->buffers;
+    output->child_data = in_data->child_data;
   }
 };
 
@@ -59,10 +85,13 @@ struct CastFunctor<O, I, typename std::enable_if<std::is_same<I, O>::value>::typ
 // Null to other things
 
 template <typename T>
-struct CastFunctor<T, NullType,
-                   typename std::enable_if<!std::is_same<T, NullType>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
-    ctx->func_ctx->SetStatus(Status::NotImplemented("NullType"));
+struct CastFunctor<T, NullType, typename std::enable_if<
+                                    std::is_base_of<FixedWidthType, T>::value>::type> {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
+    // Simply initialize data to 0
+    auto buf = output->buffers[1];
+    memset(buf->mutable_data(), 0, buf->size());
   }
 };
 
@@ -73,13 +102,14 @@ struct CastFunctor<T, NullType,
 template <typename T>
 struct CastFunctor<T, BooleanType,
                    typename std::enable_if<std::is_base_of<Number, T>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
     using c_type = typename T::c_type;
-    const uint8_t* data = input.buffers[1]->data();
+    const uint8_t* data = input.data()->buffers[1]->data();
     auto out = reinterpret_cast<c_type*>(output->buffers[1]->mutable_data());
     constexpr auto kOne = static_cast<c_type>(1);
     constexpr auto kZero = static_cast<c_type>(0);
-    for (int64_t i = 0; i < input.length; ++i) {
+    for (int64_t i = 0; i < input.length(); ++i) {
       *out++ = BitUtil::GetBit(data, i) ? kOne : kZero;
     }
   }
@@ -122,11 +152,12 @@ template <typename O, typename I>
 struct CastFunctor<O, I, typename std::enable_if<std::is_same<BooleanType, O>::value &&
                                                  std::is_base_of<Number, I>::value &&
                                                  !std::is_same<O, I>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
     using in_type = typename I::c_type;
-    auto in_data = reinterpret_cast<const in_type*>(input.buffers[1]->data());
+    auto in_data = reinterpret_cast<const in_type*>(input.data()->buffers[1]->data());
     uint8_t* out_data = reinterpret_cast<uint8_t*>(output->buffers[1]->mutable_data());
-    for (int64_t i = 0; i < input.length; ++i) {
+    for (int64_t i = 0; i < input.length(); ++i) {
       BitUtil::SetBitTo(out_data, i, (*in_data++) != 0);
     }
   }
@@ -135,39 +166,42 @@ struct CastFunctor<O, I, typename std::enable_if<std::is_same<BooleanType, O>::v
 template <typename O, typename I>
 struct CastFunctor<O, I,
                    typename std::enable_if<is_integer_downcast<O, I>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
     using in_type = typename I::c_type;
     using out_type = typename O::c_type;
 
-    auto in_offset = input.offset;
+    auto in_offset = input.offset();
 
-    auto in_data = reinterpret_cast<const in_type*>(input.buffers[1]->data()) + in_offset;
+    const auto& input_buffers = input.data()->buffers;
+
+    auto in_data = reinterpret_cast<const in_type*>(input_buffers[1]->data()) + in_offset;
     auto out_data = reinterpret_cast<out_type*>(output->buffers[1]->mutable_data());
 
-    if (!ctx->options.allow_int_overflow) {
+    if (!options.allow_int_overflow) {
       constexpr in_type kMax = static_cast<in_type>(std::numeric_limits<out_type>::max());
       constexpr in_type kMin = static_cast<in_type>(std::numeric_limits<out_type>::min());
 
-      if (input.null_count > 0) {
-        const uint8_t* is_valid = input.buffers[0]->data();
+      if (input.null_count() > 0) {
+        const uint8_t* is_valid = input_buffers[0]->data();
         int64_t is_valid_offset = in_offset;
-        for (int64_t i = 0; i < input.length; ++i) {
+        for (int64_t i = 0; i < input.length(); ++i) {
           if (ARROW_PREDICT_FALSE(BitUtil::GetBit(is_valid, is_valid_offset++) &&
                                   (*in_data > kMax || *in_data < kMin))) {
-            ctx->func_ctx->SetStatus(Status::Invalid("Integer value out of bounds"));
+            ctx->SetStatus(Status::Invalid("Integer value out of bounds"));
           }
           *out_data++ = static_cast<out_type>(*in_data++);
         }
       } else {
-        for (int64_t i = 0; i < input.length; ++i) {
+        for (int64_t i = 0; i < input.length(); ++i) {
           if (ARROW_PREDICT_FALSE(*in_data > kMax || *in_data < kMin)) {
-            ctx->func_ctx->SetStatus(Status::Invalid("Integer value out of bounds"));
+            ctx->SetStatus(Status::Invalid("Integer value out of bounds"));
           }
           *out_data++ = static_cast<out_type>(*in_data++);
         }
       }
     } else {
-      for (int64_t i = 0; i < input.length; ++i) {
+      for (int64_t i = 0; i < input.length(); ++i) {
         *out_data++ = static_cast<out_type>(*in_data++);
       }
     }
@@ -178,13 +212,14 @@ template <typename O, typename I>
 struct CastFunctor<O, I,
                    typename std::enable_if<is_numeric_cast<O, I>::value &&
                                            !is_integer_downcast<O, I>::value>::type> {
-  void operator()(CastContext* ctx, const ArrayData& input, ArrayData* output) {
+  void operator()(FunctionContext* ctx, const CastOptions& options, const Array& input,
+                  ArrayData* output) {
     using in_type = typename I::c_type;
     using out_type = typename O::c_type;
 
-    auto in_data = reinterpret_cast<const in_type*>(input.buffers[1]->data());
+    auto in_data = reinterpret_cast<const in_type*>(input.data()->buffers[1]->data());
     auto out_data = reinterpret_cast<out_type*>(output->buffers[1]->mutable_data());
-    for (int64_t i = 0; i < input.length; ++i) {
+    for (int64_t i = 0; i < input.length(); ++i) {
       *out_data++ = static_cast<out_type>(*in_data++);
     }
   }
@@ -192,12 +227,90 @@ struct CastFunctor<O, I,
 
 // ----------------------------------------------------------------------
 
-#define CAST_CASE(InType, OutType)                                            \
-  case OutType::type_id:                                                      \
-    return [type](CastContext* ctx, const ArrayData& input, ArrayData* out) { \
-      CastFunctor<OutType, InType> func;                                      \
-      func(ctx, input, out);                                                  \
+typedef std::function<void(FunctionContext*, const CastOptions& options, const Array&,
+                           ArrayData*)>
+    CastFunction;
+
+static Status AllocateIfNotPreallocated(FunctionContext* ctx, const Array& input,
+                                        ArrayData* out) {
+  if (!is_primitive(out->type->id())) {
+    return Status::NotImplemented(out->type->ToString());
+  }
+
+  const auto& fw_type = static_cast<const FixedWidthType&>(*out->type);
+
+  const int64_t length = input.length();
+
+  out->null_count = input.null_count();
+
+  // Propagate bitmap unless we are null type
+  std::shared_ptr<Buffer> validity_bitmap = input.data()->buffers[0];
+  if (input.type_id() == Type::NA) {
+    int64_t bitmap_size = BitUtil::BytesForBits(length);
+    RETURN_NOT_OK(ctx->Allocate(bitmap_size, &validity_bitmap));
+    memset(validity_bitmap->mutable_data(), 0, bitmap_size);
+  }
+
+  if (out->buffers.size() == 2) {
+    // Assuming preallocated, propagage bitmap and move on
+    out->buffers[0] = validity_bitmap;
+    return Status::OK();
+  } else {
+    DCHECK_EQ(0, out->buffers.size());
+  }
+
+  out->buffers.push_back(validity_bitmap);
+
+  std::shared_ptr<Buffer> out_data;
+
+  int bit_width = fw_type.bit_width();
+  int64_t buffer_size = 0;
+
+  if (bit_width == 1) {
+    buffer_size = BitUtil::BytesForBits(length);
+  } else if (bit_width % 8 == 0) {
+    buffer_size = length * fw_type.bit_width() / 8;
+  } else {
+    DCHECK(false);
+  }
+
+  RETURN_NOT_OK(ctx->Allocate(buffer_size, &out_data));
+  memset(out_data->mutable_data(), 0, buffer_size);
+
+  out->buffers.push_back(out_data);
+
+  return Status::OK();
+}
+
+class CastKernel : public UnaryKernel {
+ public:
+  CastKernel(const CastOptions& options, const CastFunction& func, bool is_zero_copy)
+      : options_(options), func_(func), is_zero_copy_(is_zero_copy) {}
+
+  Status Call(FunctionContext* ctx, const Array& input, ArrayData* out) override {
+    if (!is_zero_copy_) {
+      RETURN_NOT_OK(AllocateIfNotPreallocated(ctx, input, out));
     }
+    func_(ctx, options_, input, out);
+    RETURN_IF_ERROR(ctx);
+    return Status::OK();
+  }
+
+ private:
+  CastOptions options_;
+  CastFunction func_;
+  bool is_zero_copy_;
+};
+
+#define CAST_CASE(InType, OutType)                                                  \
+  case OutType::type_id:                                                            \
+    is_zero_copy = is_zero_copy_cast<OutType, InType>::value;                       \
+    func = [](FunctionContext* ctx, const CastOptions& options, const Array& input, \
+              ArrayData* out) {                                                     \
+      CastFunctor<OutType, InType> func;                                            \
+      func(ctx, options, input, out);                                               \
+    };                                                                              \
+    break;
 
 #define NUMERIC_CASES(FN, IN_TYPE) \
   FN(IN_TYPE, BooleanType);        \
@@ -212,37 +325,78 @@ struct CastFunctor<O, I,
   FN(IN_TYPE, FloatType);          \
   FN(IN_TYPE, DoubleType);
 
-#define GET_CAST_FUNCTION(CapType)                                                    \
-  static CastFunction Get##CapType##CastFunc(const std::shared_ptr<DataType>& type) { \
-    switch (type->id()) {                                                             \
-      NUMERIC_CASES(CAST_CASE, CapType);                                              \
-      default:                                                                        \
-        break;                                                                        \
-    }                                                                                 \
-    return nullptr;                                                                   \
+#define NULL_CASES(FN, IN_TYPE) \
+  NUMERIC_CASES(FN, IN_TYPE)    \
+  FN(NullType, Time32Type);     \
+  FN(NullType, Date32Type);     \
+  FN(NullType, TimestampType);  \
+  FN(NullType, Time64Type);     \
+  FN(NullType, Date64Type);
+
+#define INT32_CASES(FN, IN_TYPE) \
+  NUMERIC_CASES(FN, IN_TYPE)     \
+  FN(Int32Type, Time32Type);     \
+  FN(Int32Type, Date32Type);
+
+#define INT64_CASES(FN, IN_TYPE) \
+  NUMERIC_CASES(FN, IN_TYPE)     \
+  FN(Int64Type, TimestampType);  \
+  FN(Int64Type, Time64Type);     \
+  FN(Int64Type, Date64Type);
+
+#define DATE32_CASES(FN, IN_TYPE) FN(Date32Type, Date32Type);
+
+#define DATE64_CASES(FN, IN_TYPE) FN(Date64Type, Date64Type);
+
+#define TIME32_CASES(FN, IN_TYPE) FN(Time32Type, Time32Type);
+
+#define TIME64_CASES(FN, IN_TYPE) FN(Time64Type, Time64Type);
+
+#define TIMESTAMP_CASES(FN, IN_TYPE) FN(TimestampType, TimestampType);
+
+#define GET_CAST_FUNCTION(CASE_GENERATOR, InType)                                       \
+  static std::unique_ptr<UnaryKernel> Get##InType##CastFunc(                            \
+      const std::shared_ptr<DataType>& out_type, const CastOptions& options) {          \
+    CastFunction func;                                                                  \
+    bool is_zero_copy = false;                                                          \
+    switch (out_type->id()) {                                                           \
+      CASE_GENERATOR(CAST_CASE, InType);                                                \
+      default:                                                                          \
+        break;                                                                          \
+    }                                                                                   \
+    if (func != nullptr) {                                                              \
+      return std::unique_ptr<UnaryKernel>(new CastKernel(options, func, is_zero_copy)); \
+    }                                                                                   \
+    return nullptr;                                                                     \
   }
 
-#define CAST_FUNCTION_CASE(CapType)          \
-  case CapType::type_id:                     \
-    *out = Get##CapType##CastFunc(out_type); \
+GET_CAST_FUNCTION(NULL_CASES, NullType);
+GET_CAST_FUNCTION(NUMERIC_CASES, BooleanType);
+GET_CAST_FUNCTION(NUMERIC_CASES, UInt8Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, Int8Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, UInt16Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, Int16Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, UInt32Type);
+GET_CAST_FUNCTION(INT32_CASES, Int32Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, UInt64Type);
+GET_CAST_FUNCTION(INT64_CASES, Int64Type);
+GET_CAST_FUNCTION(NUMERIC_CASES, FloatType);
+GET_CAST_FUNCTION(NUMERIC_CASES, DoubleType);
+GET_CAST_FUNCTION(DATE32_CASES, Date32Type);
+GET_CAST_FUNCTION(DATE64_CASES, Date64Type);
+GET_CAST_FUNCTION(TIME32_CASES, Time32Type);
+GET_CAST_FUNCTION(TIME64_CASES, Time64Type);
+GET_CAST_FUNCTION(TIMESTAMP_CASES, TimestampType);
+
+#define CAST_FUNCTION_CASE(InType)                      \
+  case InType::type_id:                                 \
+    *kernel = Get##InType##CastFunc(out_type, options); \
     break
 
-GET_CAST_FUNCTION(BooleanType);
-GET_CAST_FUNCTION(UInt8Type);
-GET_CAST_FUNCTION(Int8Type);
-GET_CAST_FUNCTION(UInt16Type);
-GET_CAST_FUNCTION(Int16Type);
-GET_CAST_FUNCTION(UInt32Type);
-GET_CAST_FUNCTION(Int32Type);
-GET_CAST_FUNCTION(UInt64Type);
-GET_CAST_FUNCTION(Int64Type);
-GET_CAST_FUNCTION(FloatType);
-GET_CAST_FUNCTION(DoubleType);
-
-static Status GetCastFunction(const DataType& in_type,
-                              const std::shared_ptr<DataType>& out_type,
-                              CastFunction* out) {
+Status GetCastFunction(const DataType& in_type, const std::shared_ptr<DataType>& out_type,
+                       const CastOptions& options, std::unique_ptr<UnaryKernel>* kernel) {
   switch (in_type.id()) {
+    CAST_FUNCTION_CASE(NullType);
     CAST_FUNCTION_CASE(BooleanType);
     CAST_FUNCTION_CASE(UInt8Type);
     CAST_FUNCTION_CASE(Int8Type);
@@ -254,10 +408,15 @@ static Status GetCastFunction(const DataType& in_type,
     CAST_FUNCTION_CASE(Int64Type);
     CAST_FUNCTION_CASE(FloatType);
     CAST_FUNCTION_CASE(DoubleType);
+    CAST_FUNCTION_CASE(Date32Type);
+    CAST_FUNCTION_CASE(Date64Type);
+    CAST_FUNCTION_CASE(Time32Type);
+    CAST_FUNCTION_CASE(Time64Type);
+    CAST_FUNCTION_CASE(TimestampType);
     default:
       break;
   }
-  if (*out == nullptr) {
+  if (*kernel == nullptr) {
     std::stringstream ss;
     ss << "No cast implemented from " << in_type.ToString() << " to "
        << out_type->ToString();
@@ -266,64 +425,19 @@ static Status GetCastFunction(const DataType& in_type,
   return Status::OK();
 }
 
-static Status AllocateLike(FunctionContext* ctx, const Array& array,
-                           const std::shared_ptr<DataType>& out_type,
-                           std::shared_ptr<ArrayData>* out) {
-  if (!is_primitive(out_type->id())) {
-    return Status::NotImplemented(out_type->ToString());
-  }
-
-  const auto& fw_type = static_cast<const FixedWidthType&>(*out_type);
-
-  auto result = std::make_shared<ArrayData>();
-  result->type = out_type;
-  result->length = array.length();
-  result->offset = 0;
-  result->null_count = array.null_count();
-
-  // Propagate null bitmap
-  // TODO(wesm): handling null bitmap when input type is NullType
-  result->buffers.push_back(array.data()->buffers[0]);
-
-  std::shared_ptr<Buffer> out_data;
-
-  int bit_width = fw_type.bit_width();
-
-  if (bit_width == 1) {
-    RETURN_NOT_OK(ctx->Allocate(BitUtil::BytesForBits(array.length()), &out_data));
-  } else if (bit_width % 8 == 0) {
-    RETURN_NOT_OK(ctx->Allocate(array.length() * fw_type.bit_width() / 8, &out_data));
-  } else {
-    DCHECK(false);
-  }
-  result->buffers.push_back(out_data);
-
-  *out = result;
-  return Status::OK();
-}
-
-static Status Cast(CastContext* cast_ctx, const Array& array,
-                   const std::shared_ptr<DataType>& out_type,
-                   std::shared_ptr<Array>* out) {
+Status Cast(FunctionContext* ctx, const Array& array,
+            const std::shared_ptr<DataType>& out_type, const CastOptions& options,
+            std::shared_ptr<Array>* out) {
   // Dynamic dispatch to obtain right cast function
-  CastFunction func;
-  RETURN_NOT_OK(GetCastFunction(*array.type(), out_type, &func));
+  std::unique_ptr<UnaryKernel> func;
+  RETURN_NOT_OK(GetCastFunction(*array.type(), out_type, options, &func));
 
-  // Allocate memory for output
-  std::shared_ptr<ArrayData> out_data;
-  RETURN_NOT_OK(AllocateLike(cast_ctx->func_ctx, array, out_type, &out_data));
+  // Data structure for output
+  auto out_data = std::make_shared<ArrayData>(out_type, array.length());
 
-  func(cast_ctx, *array.data(), out_data.get());
-  RETURN_IF_ERROR(cast_ctx->func_ctx);
+  RETURN_NOT_OK(func->Call(ctx, array, out_data.get()));
   return internal::MakeArray(out_data, out);
 }
 
-Status Cast(FunctionContext* ctx, const Array& array,
-            const std::shared_ptr<DataType>& out_type, const CastOptions& options,
-            std::shared_ptr<Array>* out) {
-  CastContext cast_ctx{ctx, options};
-  return Cast(&cast_ctx, array, out_type, out);
-}
-
 }  // namespace compute
 }  // namespace arrow

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/cast.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/cast.h b/cpp/src/arrow/compute/cast.h
index 9ca70aa..081cdd9 100644
--- a/cpp/src/arrow/compute/cast.h
+++ b/cpp/src/arrow/compute/cast.h
@@ -20,21 +20,31 @@
 
 #include <memory>
 
-#include "arrow/array.h"
+#include "arrow/status.h"
 #include "arrow/util/visibility.h"
 
 namespace arrow {
 
-using internal::ArrayData;
+class Array;
+class DataType;
 
 namespace compute {
 
 class FunctionContext;
+class UnaryKernel;
 
 struct CastOptions {
+  CastOptions() : allow_int_overflow(false) {}
+
   bool allow_int_overflow;
 };
 
+/// \since 0.7.0
+/// \note API not yet finalized
+ARROW_EXPORT
+Status GetCastFunction(const DataType& in_type, const std::shared_ptr<DataType>& to_type,
+                       const CastOptions& options, std::unique_ptr<UnaryKernel>* kernel);
+
 /// \brief Cast from one array type to another
 /// \param[in] context
 /// \param[in] array

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/compute-test.cc
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/compute-test.cc b/cpp/src/arrow/compute/compute-test.cc
index cda5755..ba645c2 100644
--- a/cpp/src/arrow/compute/compute-test.cc
+++ b/cpp/src/arrow/compute/compute-test.cc
@@ -39,10 +39,14 @@
 
 #include "arrow/compute/cast.h"
 #include "arrow/compute/context.h"
+#include "arrow/compute/kernel.h"
 
 using std::vector;
 
 namespace arrow {
+
+using internal::ArrayData;
+
 namespace compute {
 
 void AssertArraysEqual(const Array& left, const Array& right) {
@@ -72,6 +76,11 @@ class ComputeFixture {
 // ----------------------------------------------------------------------
 // Cast
 
+static void AssertBufferSame(const Array& left, const Array& right, int buffer_index) {
+  ASSERT_EQ(left.data()->buffers[buffer_index].get(),
+            right.data()->buffers[buffer_index].get());
+}
+
 class TestCast : public ComputeFixture, public ::testing::Test {
  public:
   void CheckPass(const Array& input, const Array& expected,
@@ -94,6 +103,13 @@ class TestCast : public ComputeFixture, public ::testing::Test {
     ASSERT_RAISES(Invalid, Cast(&ctx_, *input, out_type, options, &result));
   }
 
+  void CheckZeroCopy(const Array& input, const std::shared_ptr<DataType>& out_type) {
+    std::shared_ptr<Array> result;
+    ASSERT_OK(Cast(&ctx_, input, out_type, {}, &result));
+    AssertBufferSame(input, *result, 0);
+    AssertBufferSame(input, *result, 1);
+  }
+
   template <typename InType, typename I_TYPE, typename OutType, typename O_TYPE>
   void CheckCase(const std::shared_ptr<DataType>& in_type,
                  const std::vector<I_TYPE>& in_values, const std::vector<bool>& is_valid,
@@ -121,12 +137,8 @@ TEST_F(TestCast, SameTypeZeroCopy) {
   std::shared_ptr<Array> result;
   ASSERT_OK(Cast(&this->ctx_, *arr, int32(), {}, &result));
 
-  const auto& lbuffers = arr->data()->buffers;
-  const auto& rbuffers = result->data()->buffers;
-
-  // Buffers are the same
-  ASSERT_EQ(lbuffers[0].get(), rbuffers[0].get());
-  ASSERT_EQ(lbuffers[1].get(), rbuffers[1].get());
+  AssertBufferSame(*arr, *result, 0);
+  AssertBufferSame(*arr, *result, 1);
 }
 
 TEST_F(TestCast, ToBoolean) {
@@ -311,5 +323,77 @@ TEST_F(TestCast, UnsupportedTarget) {
   ASSERT_RAISES(NotImplemented, Cast(&this->ctx_, *arr, utf8(), {}, &result));
 }
 
+TEST_F(TestCast, DateTimeZeroCopy) {
+  vector<bool> is_valid = {true, false, true, true, true};
+
+  std::shared_ptr<Array> arr;
+  vector<int32_t> v1 = {0, 70000, 2000, 1000, 0};
+  ArrayFromVector<Int32Type, int32_t>(int32(), is_valid, v1, &arr);
+
+  CheckZeroCopy(*arr, time32(TimeUnit::SECOND));
+  CheckZeroCopy(*arr, date32());
+
+  vector<int64_t> v2 = {0, 70000, 2000, 1000, 0};
+  ArrayFromVector<Int64Type, int64_t>(int64(), is_valid, v2, &arr);
+
+  CheckZeroCopy(*arr, time64(TimeUnit::MICRO));
+  CheckZeroCopy(*arr, date64());
+  CheckZeroCopy(*arr, timestamp(TimeUnit::NANO));
+}
+
+TEST_F(TestCast, FromNull) {
+  // Null casts to everything
+  const int length = 10;
+
+  NullArray arr(length);
+
+  std::shared_ptr<Array> result;
+  ASSERT_OK(Cast(&ctx_, arr, int32(), {}, &result));
+
+  ASSERT_EQ(length, result->length());
+  ASSERT_EQ(length, result->null_count());
+
+  // OK to look at bitmaps
+  AssertArraysEqual(*result, *result);
+}
+
+TEST_F(TestCast, PreallocatedMemory) {
+  CastOptions options;
+  options.allow_int_overflow = false;
+
+  vector<bool> is_valid = {true, false, true, true, true};
+
+  const int64_t length = 5;
+
+  std::shared_ptr<Array> arr;
+  vector<int32_t> v1 = {0, 70000, 2000, 1000, 0};
+  vector<int64_t> e1 = {0, 70000, 2000, 1000, 0};
+  ArrayFromVector<Int32Type, int32_t>(int32(), is_valid, v1, &arr);
+
+  auto out_type = int64();
+
+  std::unique_ptr<UnaryKernel> kernel;
+  ASSERT_OK(GetCastFunction(*int32(), out_type, options, &kernel));
+
+  auto out_data = std::make_shared<ArrayData>(out_type, length);
+
+  std::shared_ptr<Buffer> out_values;
+  ASSERT_OK(this->ctx_.Allocate(length * sizeof(int64_t), &out_values));
+
+  out_data->buffers.push_back(nullptr);
+  out_data->buffers.push_back(out_values);
+
+  ASSERT_OK(kernel->Call(&this->ctx_, *arr, out_data.get()));
+
+  // Buffer address unchanged
+  ASSERT_EQ(out_values.get(), out_data->buffers[1].get());
+
+  std::shared_ptr<Array> result, expected;
+  ASSERT_OK(MakeArray(out_data, &result));
+  ArrayFromVector<Int64Type, int64_t>(int64(), is_valid, e1, &expected);
+
+  AssertArraysEqual(*expected, *result);
+}
+
 }  // namespace compute
 }  // namespace arrow

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/context.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/context.h b/cpp/src/arrow/compute/context.h
index caff2e2..051c91b 100644
--- a/cpp/src/arrow/compute/context.h
+++ b/cpp/src/arrow/compute/context.h
@@ -18,6 +18,7 @@
 #ifndef ARROW_COMPUTE_CONTEXT_H
 #define ARROW_COMPUTE_CONTEXT_H
 
+#include "arrow/memory_pool.h"
 #include "arrow/status.h"
 #include "arrow/type_fwd.h"
 #include "arrow/util/visibility.h"
@@ -35,7 +36,7 @@ namespace compute {
 /// \brief Container for variables and options used by function evaluation
 class ARROW_EXPORT FunctionContext {
  public:
-  explicit FunctionContext(MemoryPool* pool);
+  explicit FunctionContext(MemoryPool* pool ARROW_MEMORY_POOL_DEFAULT);
   MemoryPool* memory_pool() const;
 
   /// \brief Allocate buffer from the context's memory pool

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/compute/kernel.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/compute/kernel.h b/cpp/src/arrow/compute/kernel.h
new file mode 100644
index 0000000..c72d467
--- /dev/null
+++ b/cpp/src/arrow/compute/kernel.h
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef ARROW_COMPUTE_KERNEL_H
+#define ARROW_COMPUTE_KERNEL_H
+
+#include "arrow/util/visibility.h"
+
+namespace arrow {
+
+class Array;
+using internal::ArrayData;
+
+namespace compute {
+
+class FunctionContext;
+
+/// \class OpKernel
+/// \brief Base class for operator kernels
+class ARROW_EXPORT OpKernel {
+ public:
+  virtual ~OpKernel() = default;
+};
+
+/// \class UnaryKernel
+/// \brief An array-valued function of a single input argument
+class ARROW_EXPORT UnaryKernel : public OpKernel {
+ public:
+  virtual Status Call(FunctionContext* ctx, const Array& input, ArrayData* out) = 0;
+};
+
+}  // namespace compute
+}  // namespace arrow
+
+#endif  // ARROW_COMPUTE_KERNEL_H

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/python/numpy_convert.h
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/python/numpy_convert.h b/cpp/src/arrow/python/numpy_convert.h
index 7b3b3b7..93c4848 100644
--- a/cpp/src/arrow/python/numpy_convert.h
+++ b/cpp/src/arrow/python/numpy_convert.h
@@ -57,10 +57,7 @@ bool is_contiguous(PyObject* array);
 ARROW_EXPORT
 Status NumPyDtypeToArrow(PyObject* dtype, std::shared_ptr<DataType>* out);
 
-ARROW_EXPORT
 Status GetTensorType(PyObject* dtype, std::shared_ptr<DataType>* out);
-
-ARROW_EXPORT
 Status GetNumPyType(const DataType& type, int* type_num);
 
 ARROW_EXPORT Status NdarrayToTensor(MemoryPool* pool, PyObject* ao,

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/cpp/src/arrow/python/pandas_to_arrow.cc
----------------------------------------------------------------------
diff --git a/cpp/src/arrow/python/pandas_to_arrow.cc b/cpp/src/arrow/python/pandas_to_arrow.cc
index 2493779..1f96f4e 100644
--- a/cpp/src/arrow/python/pandas_to_arrow.cc
+++ b/cpp/src/arrow/python/pandas_to_arrow.cc
@@ -44,6 +44,9 @@
 #include "arrow/util/macros.h"
 #include "arrow/visitor_inline.h"
 
+#include "arrow/compute/cast.h"
+#include "arrow/compute/context.h"
+
 #include "arrow/python/builtin_convert.h"
 #include "arrow/python/common.h"
 #include "arrow/python/config.h"
@@ -347,8 +350,8 @@ class PandasConverter {
     }
 
     BufferVector buffers = {null_bitmap_, data};
-    auto arr_data = std::make_shared<ArrayData>(type_, length_, std::move(buffers),
-                                                null_count, 0);
+    auto arr_data =
+        std::make_shared<ArrayData>(type_, length_, std::move(buffers), null_count, 0);
     return PushArray(arr_data);
   }
 
@@ -419,9 +422,11 @@ class PandasConverter {
   Status ConvertLists(const std::shared_ptr<DataType>& type);
   Status ConvertLists(const std::shared_ptr<DataType>& type, ListBuilder* builder,
                       PyObject* list);
-  Status ConvertObjects();
   Status ConvertDecimals();
   Status ConvertTimes();
+  Status ConvertObjects();
+  Status ConvertObjectsInfer();
+  Status ConvertObjectsInferAndCast();
 
  protected:
   MemoryPool* pool_;
@@ -460,22 +465,32 @@ void CopyStrided<PyObject*, PyObject*>(PyObject** input_data, int64_t length,
   }
 }
 
+static Status CastBuffer(const std::shared_ptr<Buffer>& input, const int64_t length,
+                         const std::shared_ptr<DataType>& in_type,
+                         const std::shared_ptr<DataType>& out_type, MemoryPool* pool,
+                         std::shared_ptr<Buffer>* out) {
+  // Must cast
+  std::vector<std::shared_ptr<Buffer>> buffers = {nullptr, input};
+  auto tmp_data = std::make_shared<ArrayData>(in_type, length, buffers, 0);
+
+  std::shared_ptr<Array> tmp_array, casted_array;
+  RETURN_NOT_OK(MakeArray(tmp_data, &tmp_array));
+
+  compute::FunctionContext context(pool);
+  compute::CastOptions cast_options;
+  cast_options.allow_int_overflow = false;
+
+  RETURN_NOT_OK(
+      compute::Cast(&context, *tmp_array, out_type, cast_options, &casted_array));
+  *out = casted_array->data()->buffers[1];
+  return Status::OK();
+}
+
 template <typename ArrowType>
 inline Status PandasConverter::ConvertData(std::shared_ptr<Buffer>* data) {
   using traits = internal::arrow_traits<ArrowType::type_id>;
   using T = typename traits::T;
 
-  // Handle LONGLONG->INT64 and other fun things
-  int type_num_compat = cast_npy_type_compat(PyArray_DESCR(arr_)->type_num);
-
-  if (NumPyTypeSize(traits::npy_type) != NumPyTypeSize(type_num_compat)) {
-    std::stringstream ss;
-    ss << "NumPy type casts not yet implemented, type sizes differ: ";
-    ss << NumPyTypeSize(traits::npy_type) << " compared to "
-       << NumPyTypeSize(type_num_compat);
-    return Status::NotImplemented(ss.str());
-  }
-
   if (is_strided()) {
     // Strided, must copy into new contiguous memory
     const int64_t stride = PyArray_STRIDES(arr_)[0];
@@ -490,6 +505,15 @@ inline Status PandasConverter::ConvertData(std::shared_ptr<Buffer>* data) {
     // Can zero-copy
     *data = std::make_shared<NumPyBuffer>(reinterpret_cast<PyObject*>(arr_));
   }
+
+  std::shared_ptr<DataType> input_type;
+  RETURN_NOT_OK(
+      NumPyDtypeToArrow(reinterpret_cast<PyObject*>(PyArray_DESCR(arr_)), &input_type));
+
+  if (!input_type->Equals(*type_)) {
+    RETURN_NOT_OK(CastBuffer(*data, length_, input_type, type_, pool_, data));
+  }
+
   return Status::OK();
 }
 
@@ -845,6 +869,74 @@ Status PandasConverter::ConvertBooleans() {
   return Status::OK();
 }
 
+Status PandasConverter::ConvertObjectsInfer() {
+  Ndarray1DIndexer<PyObject*> objects;
+
+  PyAcquireGIL lock;
+  objects.Init(arr_);
+  PyDateTime_IMPORT;
+
+  OwnedRef decimal;
+  OwnedRef Decimal;
+  RETURN_NOT_OK(ImportModule("decimal", &decimal));
+  RETURN_NOT_OK(ImportFromModule(decimal, "Decimal", &Decimal));
+
+  for (int64_t i = 0; i < length_; ++i) {
+    PyObject* obj = objects[i];
+    if (PandasObjectIsNull(obj)) {
+      continue;
+    } else if (PyObject_is_string(obj)) {
+      return ConvertObjectStrings();
+    } else if (PyObject_is_float(obj)) {
+      return ConvertObjectFloats();
+    } else if (PyBool_Check(obj)) {
+      return ConvertBooleans();
+    } else if (PyObject_is_integer(obj)) {
+      return ConvertObjectIntegers();
+    } else if (PyDate_CheckExact(obj)) {
+      // We could choose Date32 or Date64
+      return ConvertDates<Date32Type>();
+    } else if (PyTime_Check(obj)) {
+      return ConvertTimes();
+    } else if (PyObject_IsInstance(const_cast<PyObject*>(obj), Decimal.obj())) {
+      return ConvertDecimals();
+    } else if (PyList_Check(obj) || PyArray_Check(obj)) {
+      std::shared_ptr<DataType> inferred_type;
+      RETURN_NOT_OK(InferArrowType(obj, &inferred_type));
+      return ConvertLists(inferred_type);
+    } else {
+      const std::string supported_types =
+          "string, bool, float, int, date, time, decimal, list, array";
+      std::stringstream ss;
+      ss << "Error inferring Arrow type for Python object array. ";
+      RETURN_NOT_OK(InvalidConversion(obj, supported_types, &ss));
+      return Status::Invalid(ss.str());
+    }
+  }
+  out_arrays_.push_back(std::make_shared<NullArray>(length_));
+  return Status::OK();
+}
+
+Status PandasConverter::ConvertObjectsInferAndCast() {
+  size_t position = out_arrays_.size();
+  RETURN_NOT_OK(ConvertObjectsInfer());
+
+  std::shared_ptr<Array> arr = out_arrays_[position];
+
+  // Perform cast
+  compute::FunctionContext context(pool_);
+  compute::CastOptions options;
+  options.allow_int_overflow = false;
+
+  std::shared_ptr<Array> casted;
+  RETURN_NOT_OK(compute::Cast(&context, *arr, type_, options, &casted));
+
+  // Replace with casted values
+  out_arrays_[position] = casted;
+
+  return Status::OK();
+}
+
 Status PandasConverter::ConvertObjects() {
   // Python object arrays are annoying, since we could have one of:
   //
@@ -858,13 +950,6 @@ Status PandasConverter::ConvertObjects() {
 
   RETURN_NOT_OK(InitNullBitmap());
 
-  Ndarray1DIndexer<PyObject*> objects;
-
-  PyAcquireGIL lock;
-  objects.Init(arr_);
-  PyDateTime_IMPORT;
-  lock.release();
-
   // This means we received an explicit type from the user
   if (type_) {
     switch (type_->id()) {
@@ -885,53 +970,12 @@ Status PandasConverter::ConvertObjects() {
       case Type::DECIMAL:
         return ConvertDecimals();
       default:
-        return Status::TypeError("No known conversion to Arrow type");
+        return ConvertObjectsInferAndCast();
     }
   } else {
     // Re-acquire GIL
-    lock.acquire();
-
-    OwnedRef decimal;
-    OwnedRef Decimal;
-    RETURN_NOT_OK(ImportModule("decimal", &decimal));
-    RETURN_NOT_OK(ImportFromModule(decimal, "Decimal", &Decimal));
-
-    for (int64_t i = 0; i < length_; ++i) {
-      PyObject* obj = objects[i];
-      if (PandasObjectIsNull(obj)) {
-        continue;
-      } else if (PyObject_is_string(obj)) {
-        return ConvertObjectStrings();
-      } else if (PyObject_is_float(obj)) {
-        return ConvertObjectFloats();
-      } else if (PyBool_Check(obj)) {
-        return ConvertBooleans();
-      } else if (PyObject_is_integer(obj)) {
-        return ConvertObjectIntegers();
-      } else if (PyDate_CheckExact(obj)) {
-        // We could choose Date32 or Date64
-        return ConvertDates<Date32Type>();
-      } else if (PyTime_Check(obj)) {
-        return ConvertTimes();
-      } else if (PyObject_IsInstance(const_cast<PyObject*>(obj), Decimal.obj())) {
-        return ConvertDecimals();
-      } else if (PyList_Check(obj) || PyArray_Check(obj)) {
-        std::shared_ptr<DataType> inferred_type;
-        RETURN_NOT_OK(InferArrowType(obj, &inferred_type));
-        return ConvertLists(inferred_type);
-      } else {
-        const std::string supported_types =
-            "string, bool, float, int, date, time, decimal, list, array";
-        std::stringstream ss;
-        ss << "Error inferring Arrow type for Python object array. ";
-        RETURN_NOT_OK(InvalidConversion(obj, supported_types, &ss));
-        return Status::Invalid(ss.str());
-      }
-    }
+    return ConvertObjectsInfer();
   }
-
-  out_arrays_.push_back(std::make_shared<NullArray>(length_));
-  return Status::OK();
 }
 
 template <typename T>

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/python/pyarrow/array.pxi
----------------------------------------------------------------------
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index a693f45..eec6180 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -88,6 +88,19 @@ def _normalize_slice(object arrow_obj, slice key):
         return arrow_obj.slice(start, stop - start)
 
 
+cdef class _FunctionContext:
+    cdef:
+        unique_ptr[CFunctionContext] ctx
+
+    def __cinit__(self):
+        self.ctx.reset(new CFunctionContext(c_default_memory_pool()))
+
+cdef _FunctionContext _global_ctx = _FunctionContext()
+
+cdef CFunctionContext* _context() nogil:
+    return _global_ctx.ctx.get()
+
+
 cdef class Array:
 
     cdef void init(self, const shared_ptr[CArray]& sp_array):
@@ -99,6 +112,34 @@ cdef class Array:
         with nogil:
             check_status(DebugPrint(deref(self.ap), 0))
 
+    def cast(self, DataType target_type, safe=True):
+        """
+        Cast array values to another data type
+
+        Parameters
+        ----------
+        target_type : DataType
+            Type to cast to
+        safe : boolean, default True
+            Check for overflows or other unsafe conversions
+
+        Returns
+        -------
+        casted : Array
+        """
+        cdef:
+            CCastOptions options
+            shared_ptr[CArray] result
+
+        if not safe:
+            options.allow_int_overflow = 1
+
+        with nogil:
+            check_status(Cast(_context(), self.ap[0], target_type.sp_type,
+                              options, &result))
+
+        return pyarrow_wrap_array(result)
+
     @staticmethod
     def from_pandas(obj, mask=None, DataType type=None,
                     timestamps_to_ms=False,

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/python/pyarrow/includes/libarrow.pxd
----------------------------------------------------------------------
diff --git a/python/pyarrow/includes/libarrow.pxd b/python/pyarrow/includes/libarrow.pxd
index cc35684..3b7ddcf 100644
--- a/python/pyarrow/includes/libarrow.pxd
+++ b/python/pyarrow/includes/libarrow.pxd
@@ -706,9 +706,6 @@ cdef extern from "arrow/ipc/api.h" namespace "arrow::ipc" nogil:
                             InputStream* stream,
                             shared_ptr[CRecordBatch]* out)
 
-
-cdef extern from "arrow/ipc/feather.h" namespace "arrow::ipc::feather" nogil:
-
     cdef cppclass CFeatherWriter" arrow::ipc::feather::TableWriter":
         @staticmethod
         CStatus Open(const shared_ptr[OutputStream]& stream,
@@ -737,6 +734,21 @@ cdef extern from "arrow/ipc/feather.h" namespace "arrow::ipc::feather" nogil:
         c_string GetColumnName(int i)
 
 
+cdef extern from "arrow/compute/api.h" namespace "arrow::compute" nogil:
+
+    cdef cppclass CFunctionContext" arrow::compute::FunctionContext":
+        CFunctionContext()
+        CFunctionContext(CMemoryPool* pool)
+
+    cdef cppclass CCastOptions" arrow::compute::CastOptions":
+        c_bool allow_int_overflow
+
+    CStatus Cast(CFunctionContext* context, const CArray& array,
+                 const shared_ptr[CDataType]& to_type,
+                 const CCastOptions& options,
+                 shared_ptr[CArray]* out)
+
+
 cdef extern from "arrow/python/api.h" namespace "arrow::py" nogil:
     shared_ptr[CDataType] GetPrimitiveType(Type type)
     shared_ptr[CDataType] GetTimestampType(TimeUnit unit)

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/python/pyarrow/tests/test_array.py
----------------------------------------------------------------------
diff --git a/python/pyarrow/tests/test_array.py b/python/pyarrow/tests/test_array.py
index 5a373b4..f316417 100644
--- a/python/pyarrow/tests/test_array.py
+++ b/python/pyarrow/tests/test_array.py
@@ -211,6 +211,73 @@ def test_list_from_arrays():
     assert result.equals(expected)
 
 
+def _check_cast_case(case, safe=True):
+    in_data, in_type, out_data, out_type = case
+
+    in_arr = pa.Array.from_pandas(in_data, type=in_type)
+
+    casted = in_arr.cast(out_type, safe=safe)
+    expected = pa.Array.from_pandas(out_data, type=out_type)
+    assert casted.equals(expected)
+
+
+def test_cast_integers_safe():
+    safe_cases = [
+        (np.array([0, 1, 2, 3], dtype='i1'), pa.int8(),
+         np.array([0, 1, 2, 3], dtype='i4'), pa.int32()),
+        (np.array([0, 1, 2, 3], dtype='i1'), pa.int8(),
+         np.array([0, 1, 2, 3], dtype='u4'), pa.uint16()),
+        (np.array([0, 1, 2, 3], dtype='i1'), pa.int8(),
+         np.array([0, 1, 2, 3], dtype='u1'), pa.uint8()),
+        (np.array([0, 1, 2, 3], dtype='i1'), pa.int8(),
+         np.array([0, 1, 2, 3], dtype='f8'), pa.float64())
+    ]
+
+    for case in safe_cases:
+        _check_cast_case(case)
+
+    unsafe_cases = [
+        (np.array([50000], dtype='i4'), pa.int32(), pa.int16()),
+        (np.array([70000], dtype='i4'), pa.int32(), pa.uint16()),
+        (np.array([-1], dtype='i4'), pa.int32(), pa.uint16()),
+        (np.array([50000], dtype='u2'), pa.uint16(), pa.int16())
+    ]
+    for in_data, in_type, out_type in unsafe_cases:
+        in_arr = pa.Array.from_pandas(in_data, type=in_type)
+
+        with pytest.raises(pa.ArrowInvalid):
+            in_arr.cast(out_type)
+
+
+def test_cast_integers_unsafe():
+    # We let NumPy do the unsafe casting
+    unsafe_cases = [
+        (np.array([50000], dtype='i4'), pa.int32(),
+         np.array([50000], dtype='i2'), pa.int16()),
+        (np.array([70000], dtype='i4'), pa.int32(),
+         np.array([70000], dtype='u2'), pa.uint16()),
+        (np.array([-1], dtype='i4'), pa.int32(),
+         np.array([-1], dtype='u2'), pa.uint16()),
+        (np.array([50000], dtype='u2'), pa.uint16(),
+         np.array([50000], dtype='i2'), pa.int16())
+    ]
+
+    for case in unsafe_cases:
+        _check_cast_case(case, safe=False)
+
+
+def test_cast_signed_to_unsigned():
+    safe_cases = [
+        (np.array([0, 1, 2, 3], dtype='i1'), pa.uint8(),
+         np.array([0, 1, 2, 3], dtype='u1'), pa.uint8()),
+        (np.array([0, 1, 2, 3], dtype='i2'), pa.uint16(),
+         np.array([0, 1, 2, 3], dtype='u2'), pa.uint16())
+    ]
+
+    for case in safe_cases:
+        _check_cast_case(case)
+
+
 def test_simple_type_construction():
     result = pa.lib.TimestampType()
     with pytest.raises(TypeError):

http://git-wip-us.apache.org/repos/asf/arrow/blob/de2edc8d/python/pyarrow/tests/test_convert_pandas.py
----------------------------------------------------------------------
diff --git a/python/pyarrow/tests/test_convert_pandas.py b/python/pyarrow/tests/test_convert_pandas.py
index 6442434..e98e83d 100644
--- a/python/pyarrow/tests/test_convert_pandas.py
+++ b/python/pyarrow/tests/test_convert_pandas.py
@@ -239,6 +239,15 @@ class TestPandasConversion(unittest.TestCase):
 
         tm.assert_frame_equal(result, ex_frame)
 
+    def test_array_from_pandas_type_cast(self):
+        arr = np.arange(10, dtype='int64')
+
+        target_type = pa.int8()
+
+        result = pa.Array.from_pandas(arr, type=target_type)
+        expected = pa.Array.from_pandas(arr.astype('int8'))
+        assert result.equals(expected)
+
     def test_boolean_no_nulls(self):
         num_values = 100
 
@@ -279,6 +288,17 @@ class TestPandasConversion(unittest.TestCase):
         schema = pa.schema([field])
         self._check_pandas_roundtrip(df, expected_schema=schema)
 
+    def test_all_nulls_cast_numeric(self):
+        arr = np.array([None], dtype=object)
+
+        def _check_type(t):
+            a2 = pa.Array.from_pandas(arr, type=t)
+            assert a2.type == t
+            assert a2[0].as_py() is None
+
+        _check_type(pa.int32())
+        _check_type(pa.float64())
+
     def test_unicode(self):
         repeats = 1000
         values = [u'foo', None, u'bar', u'maƱana', np.nan]


Mime
View raw message