Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 7B45B200C34 for ; Mon, 27 Feb 2017 08:14:18 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 79CF1160B60; Mon, 27 Feb 2017 07:14:18 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 38C14160B6C for ; Mon, 27 Feb 2017 08:14:16 +0100 (CET) Received: (qmail 67304 invoked by uid 500); 27 Feb 2017 07:14:15 -0000 Mailing-List: contact commits-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list commits@arrow.apache.org Received: (qmail 67292 invoked by uid 99); 27 Feb 2017 07:14:15 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Feb 2017 07:14:15 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 0DB19DFDE6; Mon, 27 Feb 2017 07:14:15 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: uwe@apache.org To: commits@arrow.apache.org Date: Mon, 27 Feb 2017 07:14:16 -0000 Message-Id: <457894482e354c2785a87b2973e4132f@git.apache.org> In-Reply-To: <233cddc5b8ae41d1bb0dd3c31734b64f@git.apache.org> References: <233cddc5b8ae41d1bb0dd3c31734b64f@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: [2/2] arrow git commit: ARROW-493: [C++] Permit large (length > INT32_MAX) arrays in memory archived-at: Mon, 27 Feb 2017 07:14:18 -0000 ARROW-493: [C++] Permit large (length > INT32_MAX) arrays in memory This commit relaxes the INT32_MAX length requirement for in-memory data. It does not change the Arrow memory format, nor does it permit arrays over INT32_MAX elements to be included in a RecordBatch message sent in the streaming or file formats. The purpose of this change is to enable Arrow containers to do zero-copy addressing of large datasets (generally of fixed-size elements) produced by other systems. Should those systems wish to send messages to Java, they will need to break those large arrays up into smaller pieces. We can create utilities to assist in copy-free segmentation of large in-memory datasets into compatible chunksizes. If the large data is only being used in C++-land, then there are no problems. This is a helpful change en route to adding an `arrow::Tensor` type per ARROW-550, and probably some other things. This also includes ARROW-584, as I wanted to be sure that I caught all the places in the codebase where there were imprecise integer conversions. cc @pcmoritz @robertnishihara Author: Wes McKinney Closes #352 from wesm/ARROW-493 and squashes the following commits: 013d8cc [Wes McKinney] Fix some more compiler warnings 13c4067 [Wes McKinney] Do not pass CMAKE_CXX_FLAGS to googletest ep dc50d80 [Wes McKinney] Fix last imprecise conversions c8e90bc [Wes McKinney] Fix many imprecise integer conversions 6bacdf3 [Wes McKinney] Permit in-memory arrays with more than INT32_MAX elements in Array and Builder classes. Raise if large arrays used in IPC context Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/01a67f3f Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/01a67f3f Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/01a67f3f Branch: refs/heads/master Commit: 01a67f3ff3f43f504dc92b66e04473a8b29053f1 Parents: dc103fe Author: Wes McKinney Authored: Mon Feb 27 08:14:10 2017 +0100 Committer: Uwe L. Korn Committed: Mon Feb 27 08:14:10 2017 +0100 ---------------------------------------------------------------------- ci/travis_before_script_cpp.sh | 2 +- cpp/CMakeLists.txt | 6 +- cpp/src/arrow/array-dictionary-test.cc | 2 +- cpp/src/arrow/array-primitive-test.cc | 69 ++++++++------- cpp/src/arrow/array-string-test.cc | 24 ++--- cpp/src/arrow/array-test.cc | 17 +++- cpp/src/arrow/array-union-test.cc | 2 +- cpp/src/arrow/array.cc | 84 +++++++++--------- cpp/src/arrow/array.h | 132 ++++++++++++++-------------- cpp/src/arrow/buffer.h | 14 +-- cpp/src/arrow/builder.cc | 79 +++++++++-------- cpp/src/arrow/builder.h | 63 ++++++------- cpp/src/arrow/column-benchmark.cc | 2 +- cpp/src/arrow/column.cc | 6 +- cpp/src/arrow/column.h | 2 +- cpp/src/arrow/compare.cc | 48 +++++----- cpp/src/arrow/compare.h | 2 +- cpp/src/arrow/io/file.cc | 8 +- cpp/src/arrow/io/hdfs.cc | 15 ++-- cpp/src/arrow/io/io-hdfs-test.cc | 2 +- cpp/src/arrow/ipc/adapter.cc | 24 +++-- cpp/src/arrow/ipc/ipc-json-test.cc | 2 +- cpp/src/arrow/ipc/json-internal.cc | 61 ++++++++----- cpp/src/arrow/ipc/json.cc | 4 +- cpp/src/arrow/ipc/metadata-internal.cc | 7 +- cpp/src/arrow/ipc/reader.cc | 2 +- cpp/src/arrow/ipc/test-common.h | 24 ++--- cpp/src/arrow/ipc/writer.cc | 4 +- cpp/src/arrow/pretty_print.cc | 2 +- cpp/src/arrow/schema.cc | 2 +- cpp/src/arrow/schema.h | 2 +- cpp/src/arrow/status.cc | 2 +- cpp/src/arrow/table-test.cc | 4 +- cpp/src/arrow/table.cc | 10 +-- cpp/src/arrow/table.h | 14 +-- cpp/src/arrow/test-util.h | 47 +++++----- cpp/src/arrow/type.h | 12 +-- cpp/src/arrow/type_traits.h | 54 ++++++++---- cpp/src/arrow/util/bit-util.cc | 4 +- cpp/src/arrow/util/bit-util.h | 25 +++--- python/pyarrow/array.pxd | 4 +- python/pyarrow/array.pyx | 2 +- python/pyarrow/includes/libarrow.pxd | 16 ++-- python/pyarrow/scalar.pxd | 8 +- python/pyarrow/scalar.pyx | 10 +-- python/pyarrow/table.pyx | 2 +- python/src/pyarrow/adapters/builtin.cc | 4 +- python/src/pyarrow/adapters/pandas.cc | 13 ++- 48 files changed, 508 insertions(+), 436 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/ci/travis_before_script_cpp.sh ---------------------------------------------------------------------- diff --git a/ci/travis_before_script_cpp.sh b/ci/travis_before_script_cpp.sh index feacf8f..f804a38 100755 --- a/ci/travis_before_script_cpp.sh +++ b/ci/travis_before_script_cpp.sh @@ -36,7 +36,7 @@ CMAKE_COMMON_FLAGS="\ if [ $TRAVIS_OS_NAME == "linux" ]; then cmake -DARROW_TEST_MEMCHECK=on \ $CMAKE_COMMON_FLAGS \ - -DARROW_CXXFLAGS=-Werror \ + -DARROW_CXXFLAGS="-Wconversion -Werror" \ $CPP_DIR else cmake $CMAKE_COMMON_FLAGS \ http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/CMakeLists.txt ---------------------------------------------------------------------- diff --git a/cpp/CMakeLists.txt b/cpp/CMakeLists.txt index be3d4b9..f6dab78 100644 --- a/cpp/CMakeLists.txt +++ b/cpp/CMakeLists.txt @@ -123,7 +123,9 @@ endif() include(SetupCxxFlags) # Add common flags -set(CMAKE_CXX_FLAGS "${ARROW_CXXFLAGS} ${CXX_COMMON_FLAGS} ${CMAKE_CXX_FLAGS}") +set(CMAKE_CXX_FLAGS "${CXX_COMMON_FLAGS} ${CMAKE_CXX_FLAGS}") +set(EP_CXX_FLAGS "${CMAKE_CXX_FLAGS}") +set(CMAKE_CXX_FLAGS "${ARROW_CXXFLAGS} ${CMAKE_CXX_FLAGS}") # Determine compiler version include(CompilerInfo) @@ -452,7 +454,7 @@ if(ARROW_BUILD_TESTS) set(GTEST_CMAKE_CXX_FLAGS "-fPIC") endif() string(TOUPPER ${CMAKE_BUILD_TYPE} UPPERCASE_BUILD_TYPE) - set(GTEST_CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${CMAKE_CXX_FLAGS_${UPPERCASE_BUILD_TYPE}} ${GTEST_CMAKE_CXX_FLAGS}") + set(GTEST_CMAKE_CXX_FLAGS "${EP_CXX_FLAGS} ${CMAKE_CXX_FLAGS_${UPPERCASE_BUILD_TYPE}} ${GTEST_CMAKE_CXX_FLAGS}") set(GTEST_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/googletest_ep-prefix/src/googletest_ep") set(GTEST_INCLUDE_DIR "${GTEST_PREFIX}/include") http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array-dictionary-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array-dictionary-test.cc b/cpp/src/arrow/array-dictionary-test.cc index 61381b7..0c4e628 100644 --- a/cpp/src/arrow/array-dictionary-test.cc +++ b/cpp/src/arrow/array-dictionary-test.cc @@ -95,7 +95,7 @@ TEST(TestDictionary, Equals) { ASSERT_FALSE(array->RangeEquals(1, 3, 1, array4)); // ARROW-33 Test slices - const int size = array->length(); + const int64_t size = array->length(); std::shared_ptr slice, slice2; slice = array->Array::Slice(2); http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array-primitive-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array-primitive-test.cc b/cpp/src/arrow/array-primitive-test.cc index f8bbd77..7b36275 100644 --- a/cpp/src/arrow/array-primitive-test.cc +++ b/cpp/src/arrow/array-primitive-test.cc @@ -97,7 +97,7 @@ class TestPrimitiveBuilder : public TestBuilder { builder_nn_ = std::dynamic_pointer_cast(tmp); } - void RandomData(int N, double pct_null = 0.1) { + void RandomData(int64_t N, double pct_null = 0.1) { Attrs::draw(N, &draws_); valid_bytes_.resize(N); @@ -105,13 +105,13 @@ class TestPrimitiveBuilder : public TestBuilder { } void Check(const std::shared_ptr& builder, bool nullable) { - int size = builder->length(); + int64_t size = builder->length(); auto ex_data = std::make_shared( reinterpret_cast(draws_.data()), size * sizeof(T)); std::shared_ptr ex_null_bitmap; - int32_t ex_null_count = 0; + int64_t ex_null_count = 0; if (nullable) { ex_null_bitmap = test::bytes_to_null_buffer(valid_bytes_); @@ -157,18 +157,18 @@ class TestPrimitiveBuilder : public TestBuilder { return std::shared_ptr(new Type()); \ } -#define PINT_DECL(CapType, c_type, LOWER, UPPER) \ - struct P##CapType { \ - PTYPE_DECL(CapType, c_type); \ - static void draw(int N, vector* draws) { \ - test::randint(N, LOWER, UPPER, draws); \ - } \ +#define PINT_DECL(CapType, c_type, LOWER, UPPER) \ + struct P##CapType { \ + PTYPE_DECL(CapType, c_type); \ + static void draw(int64_t N, vector* draws) { \ + test::randint(N, LOWER, UPPER, draws); \ + } \ } #define PFLOAT_DECL(CapType, c_type, LOWER, UPPER) \ struct P##CapType { \ PTYPE_DECL(CapType, c_type); \ - static void draw(int N, vector* draws) { \ + static void draw(int64_t N, vector* draws) { \ test::random_real(N, 0, LOWER, UPPER, draws); \ } \ } @@ -191,7 +191,7 @@ struct PBoolean { }; template <> -void TestPrimitiveBuilder::RandomData(int N, double pct_null) { +void TestPrimitiveBuilder::RandomData(int64_t N, double pct_null) { draws_.resize(N); valid_bytes_.resize(N); @@ -202,12 +202,12 @@ void TestPrimitiveBuilder::RandomData(int N, double pct_null) { template <> void TestPrimitiveBuilder::Check( const std::shared_ptr& builder, bool nullable) { - int size = builder->length(); + int64_t size = builder->length(); auto ex_data = test::bytes_to_null_buffer(draws_); std::shared_ptr ex_null_bitmap; - int32_t ex_null_count = 0; + int64_t ex_null_count = 0; if (nullable) { ex_null_bitmap = test::bytes_to_null_buffer(valid_bytes_); @@ -233,7 +233,7 @@ void TestPrimitiveBuilder::Check( ASSERT_EQ(expected->length(), result->length()); - for (int i = 0; i < result->length(); ++i) { + for (int64_t i = 0; i < result->length(); ++i) { if (nullable) { ASSERT_EQ(valid_bytes_[i] == 0, result->IsNull(i)) << i; } bool actual = BitUtil::GetBit(result->data()->data(), i); ASSERT_EQ(static_cast(draws_[i]), actual) << i; @@ -256,7 +256,7 @@ TYPED_TEST_CASE(TestPrimitiveBuilder, Primitives); TYPED_TEST(TestPrimitiveBuilder, TestInit) { DECL_TYPE(); - int n = 1000; + int64_t n = 1000; ASSERT_OK(this->builder_->Reserve(n)); ASSERT_EQ(BitUtil::NextPower2(n), this->builder_->capacity()); ASSERT_EQ(BitUtil::NextPower2(TypeTraits::bytes_required(n)), @@ -267,15 +267,15 @@ TYPED_TEST(TestPrimitiveBuilder, TestInit) { } TYPED_TEST(TestPrimitiveBuilder, TestAppendNull) { - int size = 1000; - for (int i = 0; i < size; ++i) { + int64_t size = 1000; + for (int64_t i = 0; i < size; ++i) { ASSERT_OK(this->builder_->AppendNull()); } std::shared_ptr result; ASSERT_OK(this->builder_->Finish(&result)); - for (int i = 0; i < size; ++i) { + for (int64_t i = 0; i < size; ++i) { ASSERT_TRUE(result->IsNull(i)) << i; } } @@ -283,7 +283,7 @@ TYPED_TEST(TestPrimitiveBuilder, TestAppendNull) { TYPED_TEST(TestPrimitiveBuilder, TestArrayDtorDealloc) { DECL_T(); - int size = 1000; + int64_t size = 1000; vector& draws = this->draws_; vector& valid_bytes = this->valid_bytes_; @@ -294,7 +294,7 @@ TYPED_TEST(TestPrimitiveBuilder, TestArrayDtorDealloc) { this->builder_->Reserve(size); - int i; + int64_t i; for (i = 0; i < size; ++i) { if (valid_bytes[i] > 0) { this->builder_->Append(draws[i]); @@ -314,7 +314,7 @@ TYPED_TEST(TestPrimitiveBuilder, TestArrayDtorDealloc) { TYPED_TEST(TestPrimitiveBuilder, Equality) { DECL_T(); - const int size = 1000; + const int64_t size = 1000; this->RandomData(size); vector& draws = this->draws_; vector& valid_bytes = this->valid_bytes_; @@ -326,10 +326,11 @@ TYPED_TEST(TestPrimitiveBuilder, Equality) { // Make the not equal array by negating the first valid element with itself. const auto first_valid = std::find_if( valid_bytes.begin(), valid_bytes.end(), [](uint8_t valid) { return valid > 0; }); - const int first_valid_idx = std::distance(valid_bytes.begin(), first_valid); + const int64_t first_valid_idx = std::distance(valid_bytes.begin(), first_valid); // This should be true with a very high probability, but might introduce flakiness ASSERT_LT(first_valid_idx, size - 1); - draws[first_valid_idx] = ~*reinterpret_cast(&draws[first_valid_idx]); + draws[first_valid_idx] = + static_cast(~*reinterpret_cast(&draws[first_valid_idx])); ASSERT_OK(MakeArray(valid_bytes, draws, size, builder, &unequal_array)); // test normal equality @@ -350,7 +351,7 @@ TYPED_TEST(TestPrimitiveBuilder, Equality) { TYPED_TEST(TestPrimitiveBuilder, SliceEquality) { DECL_T(); - const int size = 1000; + const int64_t size = 1000; this->RandomData(size); vector& draws = this->draws_; vector& valid_bytes = this->valid_bytes_; @@ -383,7 +384,7 @@ TYPED_TEST(TestPrimitiveBuilder, SliceEquality) { TYPED_TEST(TestPrimitiveBuilder, TestAppendScalar) { DECL_T(); - const int size = 10000; + const int64_t size = 10000; vector& draws = this->draws_; vector& valid_bytes = this->valid_bytes_; @@ -393,8 +394,8 @@ TYPED_TEST(TestPrimitiveBuilder, TestAppendScalar) { this->builder_->Reserve(1000); this->builder_nn_->Reserve(1000); - int i; - int null_count = 0; + int64_t i; + int64_t null_count = 0; // Append the first 1000 for (i = 0; i < 1000; ++i) { if (valid_bytes[i] > 0) { @@ -440,14 +441,14 @@ TYPED_TEST(TestPrimitiveBuilder, TestAppendScalar) { TYPED_TEST(TestPrimitiveBuilder, TestAppendVector) { DECL_T(); - int size = 10000; + int64_t size = 10000; this->RandomData(size); vector& draws = this->draws_; vector& valid_bytes = this->valid_bytes_; // first slug - int K = 1000; + int64_t K = 1000; ASSERT_OK(this->builder_->Append(draws.data(), K, valid_bytes.data())); ASSERT_OK(this->builder_nn_->Append(draws.data(), K)); @@ -470,7 +471,7 @@ TYPED_TEST(TestPrimitiveBuilder, TestAppendVector) { } TYPED_TEST(TestPrimitiveBuilder, TestAdvance) { - int n = 1000; + int64_t n = 1000; ASSERT_OK(this->builder_->Reserve(n)); ASSERT_OK(this->builder_->Advance(100)); @@ -478,14 +479,14 @@ TYPED_TEST(TestPrimitiveBuilder, TestAdvance) { ASSERT_OK(this->builder_->Advance(900)); - int too_many = this->builder_->capacity() - 1000 + 1; + int64_t too_many = this->builder_->capacity() - 1000 + 1; ASSERT_RAISES(Invalid, this->builder_->Advance(too_many)); } TYPED_TEST(TestPrimitiveBuilder, TestResize) { DECL_TYPE(); - int cap = kMinBuilderCapacity * 2; + int64_t cap = kMinBuilderCapacity * 2; ASSERT_OK(this->builder_->Reserve(cap)); ASSERT_EQ(cap, this->builder_->capacity()); @@ -510,7 +511,7 @@ template void CheckSliceApproxEquals() { using T = typename TYPE::c_type; - const int kSize = 50; + const int64_t kSize = 50; std::vector draws1; std::vector draws2; @@ -520,7 +521,7 @@ void CheckSliceApproxEquals() { // Make the draws equal in the sliced segment, but unequal elsewhere (to // catch not using the slice offset) - for (int i = 10; i < 30; ++i) { + for (int64_t i = 10; i < 30; ++i) { draws2[i] = draws1[i]; } http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array-string-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array-string-test.cc b/cpp/src/arrow/array-string-test.cc index d8a3585..3fdeb3c 100644 --- a/cpp/src/arrow/array-string-test.cc +++ b/cpp/src/arrow/array-string-test.cc @@ -64,7 +64,7 @@ class TestStringArray : public ::testing::Test { } void MakeArray() { - length_ = offsets_.size() - 1; + length_ = static_cast(offsets_.size()) - 1; value_buf_ = test::GetBufferFromVector(chars_); offsets_buf_ = test::GetBufferFromVector(offsets_); null_bitmap_ = test::bytes_to_null_buffer(valid_bytes_); @@ -85,8 +85,8 @@ class TestStringArray : public ::testing::Test { std::shared_ptr offsets_buf_; std::shared_ptr null_bitmap_; - int null_count_; - int length_; + int64_t null_count_; + int64_t length_; std::shared_ptr strings_; }; @@ -109,7 +109,7 @@ TEST_F(TestStringArray, TestListFunctions) { for (size_t i = 0; i < expected_.size(); ++i) { ASSERT_EQ(pos, strings_->value_offset(i)); ASSERT_EQ(static_cast(expected_[i].size()), strings_->value_length(i)); - pos += expected_[i].size(); + pos += static_cast(expected_[i].size()); } } @@ -131,7 +131,7 @@ TEST_F(TestStringArray, TestGetString) { TEST_F(TestStringArray, TestEmptyStringComparison) { offsets_ = {0, 0, 0, 0, 0, 0}; offsets_buf_ = test::GetBufferFromVector(offsets_); - length_ = offsets_.size() - 1; + length_ = static_cast(offsets_.size() - 1); auto strings_a = std::make_shared( length_, offsets_buf_, nullptr, null_bitmap_, null_count_); @@ -208,7 +208,7 @@ TEST_F(TestStringBuilder, TestScalarAppend) { std::vector strings = {"", "bb", "a", "", "ccc"}; std::vector is_null = {0, 0, 0, 1, 0}; - int N = strings.size(); + int N = static_cast(strings.size()); int reps = 1000; for (int j = 0; j < reps; ++j) { @@ -263,7 +263,7 @@ class TestBinaryArray : public ::testing::Test { } void MakeArray() { - length_ = offsets_.size() - 1; + length_ = static_cast(offsets_.size() - 1); value_buf_ = test::GetBufferFromVector(chars_); offsets_buf_ = test::GetBufferFromVector(offsets_); @@ -285,8 +285,8 @@ class TestBinaryArray : public ::testing::Test { std::shared_ptr offsets_buf_; std::shared_ptr null_bitmap_; - int null_count_; - int length_; + int64_t null_count_; + int64_t length_; std::shared_ptr strings_; }; @@ -305,7 +305,7 @@ TEST_F(TestBinaryArray, TestType) { } TEST_F(TestBinaryArray, TestListFunctions) { - int pos = 0; + size_t pos = 0; for (size_t i = 0; i < expected_.size(); ++i) { ASSERT_EQ(pos, strings_->value_offset(i)); ASSERT_EQ(static_cast(expected_[i].size()), strings_->value_length(i)); @@ -376,7 +376,7 @@ TEST_F(TestBinaryBuilder, TestScalarAppend) { std::vector strings = {"", "bb", "a", "", "ccc"}; std::vector is_null = {0, 0, 0, 1, 0}; - int N = strings.size(); + int N = static_cast(strings.size()); int reps = 1000; for (int j = 0; j < reps; ++j) { @@ -425,7 +425,7 @@ void CheckSliceEquality() { std::vector strings = {"foo", "", "bar", "baz", "qux", ""}; std::vector is_null = {0, 1, 0, 1, 0, 0}; - int N = strings.size(); + int N = static_cast(strings.size()); int reps = 10; for (int j = 0; j < reps; ++j) { http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array-test.cc b/cpp/src/arrow/array-test.cc index 45ab274..854ebb2 100644 --- a/cpp/src/arrow/array-test.cc +++ b/cpp/src/arrow/array-test.cc @@ -58,7 +58,7 @@ TEST_F(TestArray, TestLength) { std::shared_ptr MakeArrayFromValidBytes( const std::vector& v, MemoryPool* pool) { - int32_t null_count = v.size() - std::accumulate(v.begin(), v.end(), 0); + int64_t null_count = v.size() - std::accumulate(v.begin(), v.end(), 0); std::shared_ptr null_buf = test::bytes_to_null_buffer(v); BufferBuilder value_builder(pool); @@ -121,7 +121,7 @@ TEST_F(TestArray, TestIsNull) { 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1}; // clang-format on - int32_t null_count = 0; + int64_t null_count = 0; for (uint8_t x : null_bitmap) { if (x == 0) { ++null_count; } } @@ -140,6 +140,19 @@ TEST_F(TestArray, TestIsNull) { } } +TEST_F(TestArray, BuildLargeInMemoryArray) { + const int64_t length = static_cast(std::numeric_limits::max()) + 1; + + BooleanBuilder builder(default_memory_pool()); + ASSERT_OK(builder.Reserve(length)); + ASSERT_OK(builder.Advance(length)); + + std::shared_ptr result; + ASSERT_OK(builder.Finish(&result)); + + ASSERT_EQ(length, result->length()); +} + TEST_F(TestArray, TestCopy) {} } // namespace arrow http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array-union-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array-union-test.cc b/cpp/src/arrow/array-union-test.cc index eb9bd7d..83c3196 100644 --- a/cpp/src/arrow/array-union-test.cc +++ b/cpp/src/arrow/array-union-test.cc @@ -37,7 +37,7 @@ TEST(TestUnionArrayAdHoc, TestSliceEquals) { std::shared_ptr batch; ASSERT_OK(ipc::MakeUnion(&batch)); - const int size = batch->num_rows(); + const int64_t size = batch->num_rows(); auto CheckUnion = [&size](std::shared_ptr array) { std::shared_ptr slice, slice2; http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array.cc b/cpp/src/arrow/array.cc index eb4c210..284bb57 100644 --- a/cpp/src/arrow/array.cc +++ b/cpp/src/arrow/array.cc @@ -35,13 +35,13 @@ namespace arrow { // doing some computation. To avoid doing this eagerly, we set the null count // to -1 (any negative number will do). When Array::null_count is called the // first time, the null count will be computed. See ARROW-33 -constexpr int32_t kUnknownNullCount = -1; +constexpr int64_t kUnknownNullCount = -1; // ---------------------------------------------------------------------- // Base array class -Array::Array(const std::shared_ptr& type, int32_t length, - const std::shared_ptr& null_bitmap, int32_t null_count, int32_t offset) +Array::Array(const std::shared_ptr& type, int64_t length, + const std::shared_ptr& null_bitmap, int64_t null_count, int64_t offset) : type_(type), length_(length), offset_(offset), @@ -52,7 +52,7 @@ Array::Array(const std::shared_ptr& type, int32_t length, if (null_bitmap_) { null_bitmap_data_ = null_bitmap_->data(); } } -int32_t Array::null_count() const { +int64_t Array::null_count() const { if (null_count_ < 0) { if (null_bitmap_) { null_count_ = length_ - CountSetBits(null_bitmap_data_, offset_, length_); @@ -87,14 +87,14 @@ bool Array::ApproxEquals(const std::shared_ptr& arr) const { return ApproxEquals(*arr); } -bool Array::RangeEquals(int32_t start_idx, int32_t end_idx, int32_t other_start_idx, +bool Array::RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, const std::shared_ptr& other) const { if (!other) { return false; } return RangeEquals(*other, start_idx, end_idx, other_start_idx); } -bool Array::RangeEquals(const Array& other, int32_t start_idx, int32_t end_idx, - int32_t other_start_idx) const { +bool Array::RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx, + int64_t other_start_idx) const { bool are_equal = false; Status error = ArrayRangeEquals(*this, other, start_idx, end_idx, other_start_idx, &are_equal); @@ -104,15 +104,15 @@ bool Array::RangeEquals(const Array& other, int32_t start_idx, int32_t end_idx, // Last two parameters are in-out parameters static inline void ConformSliceParams( - int32_t array_offset, int32_t array_length, int32_t* offset, int32_t* length) { + int64_t array_offset, int64_t array_length, int64_t* offset, int64_t* length) { DCHECK_LE(*offset, array_length); DCHECK_GE(offset, 0); *length = std::min(array_length - *offset, *length); *offset = array_offset + *offset; } -std::shared_ptr Array::Slice(int32_t offset) const { - int32_t slice_length = length_ - offset; +std::shared_ptr Array::Slice(int64_t offset) const { + int64_t slice_length = length_ - offset; return Slice(offset, slice_length); } @@ -120,9 +120,9 @@ Status Array::Validate() const { return Status::OK(); } -NullArray::NullArray(int32_t length) : Array(null(), length, nullptr, length) {} +NullArray::NullArray(int64_t length) : Array(null(), length, nullptr, length) {} -std::shared_ptr NullArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr NullArray::Slice(int64_t offset, int64_t length) const { DCHECK_LE(offset, length_); length = std::min(length_ - offset, length); return std::make_shared(length); @@ -135,9 +135,9 @@ Status NullArray::Accept(ArrayVisitor* visitor) const { // ---------------------------------------------------------------------- // Primitive array base -PrimitiveArray::PrimitiveArray(const std::shared_ptr& type, int32_t length, +PrimitiveArray::PrimitiveArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& data, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset) + int64_t null_count, int64_t offset) : Array(type, length, null_bitmap, null_count, offset) { data_ = data; raw_data_ = data == nullptr ? nullptr : data_->data(); @@ -149,7 +149,7 @@ Status NumericArray::Accept(ArrayVisitor* visitor) const { } template -std::shared_ptr NumericArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr NumericArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared>( type_, length, data_, null_bitmap_, kUnknownNullCount, offset); @@ -173,8 +173,8 @@ template class NumericArray; // ---------------------------------------------------------------------- // BooleanArray -BooleanArray::BooleanArray(int32_t length, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap, int32_t null_count, int32_t offset) +BooleanArray::BooleanArray(int64_t length, const std::shared_ptr& data, + const std::shared_ptr& null_bitmap, int64_t null_count, int64_t offset) : PrimitiveArray(std::make_shared(), length, data, null_bitmap, null_count, offset) {} @@ -182,7 +182,7 @@ Status BooleanArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr BooleanArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr BooleanArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared( length, data_, null_bitmap_, kUnknownNullCount, offset); @@ -222,7 +222,7 @@ Status ListArray::Validate() const { int32_t prev_offset = this->value_offset(0); if (prev_offset != 0) { return Status::Invalid("The first offset wasn't zero"); } - for (int32_t i = 1; i <= length_; ++i) { + for (int64_t i = 1; i <= length_; ++i) { int32_t current_offset = this->value_offset(i); if (IsNull(i - 1) && current_offset != prev_offset) { std::stringstream ss; @@ -247,7 +247,7 @@ Status ListArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr ListArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr ListArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared( type_, length, value_offsets_, values_, null_bitmap_, kUnknownNullCount, offset); @@ -259,15 +259,15 @@ std::shared_ptr ListArray::Slice(int32_t offset, int32_t length) const { static std::shared_ptr kBinary = std::make_shared(); static std::shared_ptr kString = std::make_shared(); -BinaryArray::BinaryArray(int32_t length, const std::shared_ptr& value_offsets, +BinaryArray::BinaryArray(int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset) + int64_t null_count, int64_t offset) : BinaryArray(kBinary, length, value_offsets, data, null_bitmap, null_count, offset) { } -BinaryArray::BinaryArray(const std::shared_ptr& type, int32_t length, +BinaryArray::BinaryArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap, int32_t null_count, int32_t offset) + const std::shared_ptr& null_bitmap, int64_t null_count, int64_t offset) : Array(type, length, null_bitmap, null_count, offset), value_offsets_(value_offsets), raw_value_offsets_(nullptr), @@ -288,15 +288,15 @@ Status BinaryArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr BinaryArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr BinaryArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared( length, value_offsets_, data_, null_bitmap_, kUnknownNullCount, offset); } -StringArray::StringArray(int32_t length, const std::shared_ptr& value_offsets, +StringArray::StringArray(int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset) + int64_t null_count, int64_t offset) : BinaryArray(kString, length, value_offsets, data, null_bitmap, null_count, offset) { } @@ -309,7 +309,7 @@ Status StringArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr StringArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr StringArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared( length, value_offsets_, data_, null_bitmap_, kUnknownNullCount, offset); @@ -318,15 +318,15 @@ std::shared_ptr StringArray::Slice(int32_t offset, int32_t length) const // ---------------------------------------------------------------------- // Struct -StructArray::StructArray(const std::shared_ptr& type, int32_t length, +StructArray::StructArray(const std::shared_ptr& type, int64_t length, const std::vector>& children, - std::shared_ptr null_bitmap, int32_t null_count, int32_t offset) + std::shared_ptr null_bitmap, int64_t null_count, int64_t offset) : Array(type, length, null_bitmap, null_count, offset) { type_ = type; children_ = children; } -std::shared_ptr StructArray::field(int32_t pos) const { +std::shared_ptr StructArray::field(int pos) const { DCHECK_GT(children_.size(), 0); return children_[pos]; } @@ -340,7 +340,7 @@ Status StructArray::Validate() const { if (children_.size() > 0) { // Validate fields - int32_t array_length = children_[0]->length(); + int64_t array_length = children_[0]->length(); size_t idx = 0; for (auto it : children_) { if (it->length() != array_length) { @@ -371,7 +371,7 @@ Status StructArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr StructArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr StructArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared( type_, length, children_, null_bitmap_, kUnknownNullCount, offset); @@ -380,10 +380,10 @@ std::shared_ptr StructArray::Slice(int32_t offset, int32_t length) const // ---------------------------------------------------------------------- // UnionArray -UnionArray::UnionArray(const std::shared_ptr& type, int32_t length, +UnionArray::UnionArray(const std::shared_ptr& type, int64_t length, const std::vector>& children, const std::shared_ptr& type_ids, const std::shared_ptr& value_offsets, - const std::shared_ptr& null_bitmap, int32_t null_count, int32_t offset) + const std::shared_ptr& null_bitmap, int64_t null_count, int64_t offset) : Array(type, length, null_bitmap, null_count, offset), children_(children), type_ids_(type_ids), @@ -396,7 +396,7 @@ UnionArray::UnionArray(const std::shared_ptr& type, int32_t length, } } -std::shared_ptr UnionArray::child(int32_t pos) const { +std::shared_ptr UnionArray::child(int pos) const { DCHECK_GT(children_.size(), 0); return children_[pos]; } @@ -416,7 +416,7 @@ Status UnionArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr UnionArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr UnionArray::Slice(int64_t offset, int64_t length) const { ConformSliceParams(offset_, length_, &offset, &length); return std::make_shared(type_, length, children_, type_ids_, value_offsets_, null_bitmap_, kUnknownNullCount, offset); @@ -425,9 +425,9 @@ std::shared_ptr UnionArray::Slice(int32_t offset, int32_t length) const { // ---------------------------------------------------------------------- // DictionaryArray -Status DictionaryArray::FromBuffer(const std::shared_ptr& type, int32_t length, +Status DictionaryArray::FromBuffer(const std::shared_ptr& type, int64_t length, const std::shared_ptr& indices, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset, std::shared_ptr* out) { + int64_t null_count, int64_t offset, std::shared_ptr* out) { DCHECK_EQ(type->type, Type::DICTIONARY); const auto& dict_type = static_cast(type.get()); @@ -464,7 +464,7 @@ Status DictionaryArray::Accept(ArrayVisitor* visitor) const { return visitor->Visit(*this); } -std::shared_ptr DictionaryArray::Slice(int32_t offset, int32_t length) const { +std::shared_ptr DictionaryArray::Slice(int64_t offset, int64_t length) const { std::shared_ptr sliced_indices = indices_->Slice(offset, length); return std::make_shared(type_, sliced_indices); } @@ -476,9 +476,9 @@ std::shared_ptr DictionaryArray::Slice(int32_t offset, int32_t length) co out->reset(new ArrayType(type, length, data, null_bitmap, null_count, offset)); \ break; -Status MakePrimitiveArray(const std::shared_ptr& type, int32_t length, +Status MakePrimitiveArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& data, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset, std::shared_ptr* out) { + int64_t null_count, int64_t offset, std::shared_ptr* out) { switch (type->type) { MAKE_PRIMITIVE_ARRAY_CASE(BOOL, BooleanArray); MAKE_PRIMITIVE_ARRAY_CASE(UINT8, UInt8Array); http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/array.h ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/array.h b/cpp/src/arrow/array.h index 8bb914e..f20f212 100644 --- a/cpp/src/arrow/array.h +++ b/cpp/src/arrow/array.h @@ -80,30 +80,30 @@ class ARROW_EXPORT ArrayVisitor { /// be computed on the first call to null_count() class ARROW_EXPORT Array { public: - Array(const std::shared_ptr& type, int32_t length, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + Array(const std::shared_ptr& type, int64_t length, + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); virtual ~Array() = default; /// Determine if a slot is null. For inner loops. Does *not* boundscheck - bool IsNull(int i) const { + bool IsNull(int64_t i) const { return null_bitmap_data_ != nullptr && BitUtil::BitNotSet(null_bitmap_data_, i + offset_); } /// Size in the number of elements this array contains. - int32_t length() const { return length_; } + int64_t length() const { return length_; } /// A relative position into another array's data, to enable zero-copy /// slicing. This value defaults to zero - int32_t offset() const { return offset_; } + int64_t offset() const { return offset_; } /// The number of null entries in the array. If the null count was not known /// at time of construction (and set to a negative value), then the null /// count will be computed and cached on the first invocation of this /// function - int32_t null_count() const; + int64_t null_count() const; std::shared_ptr type() const { return type_; } Type::type type_enum() const { return type_->type; } @@ -128,11 +128,11 @@ class ARROW_EXPORT Array { /// Compare if the range of slots specified are equal for the given array and /// this array. end_idx exclusive. This methods does not bounds check. - bool RangeEquals(int32_t start_idx, int32_t end_idx, int32_t other_start_idx, + bool RangeEquals(int64_t start_idx, int64_t end_idx, int64_t other_start_idx, const std::shared_ptr& other) const; - bool RangeEquals(const Array& other, int32_t start_idx, int32_t end_idx, - int32_t other_start_idx) const; + bool RangeEquals(const Array& other, int64_t start_idx, int64_t end_idx, + int64_t other_start_idx) const; /// Determines if the array is internally consistent. /// @@ -150,20 +150,20 @@ class ARROW_EXPORT Array { /// the length will be adjusted accordingly /// /// \return a new object wrapped in std::shared_ptr - virtual std::shared_ptr Slice(int32_t offset, int32_t length) const = 0; + virtual std::shared_ptr Slice(int64_t offset, int64_t length) const = 0; /// Slice from offset until end of the array - std::shared_ptr Slice(int32_t offset) const; + std::shared_ptr Slice(int64_t offset) const; protected: std::shared_ptr type_; - int32_t length_; - int32_t offset_; + int64_t length_; + int64_t offset_; // This member is marked mutable so that it can be modified when null_count() // is called from a const context and the null count has to be computed (if // it is not already known) - mutable int32_t null_count_; + mutable int64_t null_count_; std::shared_ptr null_bitmap_; const uint8_t* null_bitmap_data_; @@ -178,20 +178,20 @@ class ARROW_EXPORT NullArray : public Array { public: using TypeClass = NullType; - explicit NullArray(int32_t length); + explicit NullArray(int64_t length); Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; }; /// Base class for fixed-size logical types class ARROW_EXPORT PrimitiveArray : public Array { public: - PrimitiveArray(const std::shared_ptr& type, int32_t length, + PrimitiveArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); /// The memory containing this array's data /// This buffer does not account for any slice offset @@ -214,10 +214,10 @@ class ARROW_EXPORT NumericArray : public PrimitiveArray { // metadata template NumericArray( - typename std::enable_if::is_parameter_free, int32_t>::type length, + typename std::enable_if::is_parameter_free, int64_t>::type length, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0) + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0) : PrimitiveArray(TypeTraits::type_singleton(), length, data, null_bitmap, null_count, offset) {} @@ -227,9 +227,9 @@ class ARROW_EXPORT NumericArray : public PrimitiveArray { Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; - value_type Value(int i) const { return raw_data()[i]; } + value_type Value(int64_t i) const { return raw_data()[i]; } }; class ARROW_EXPORT BooleanArray : public PrimitiveArray { @@ -238,15 +238,15 @@ class ARROW_EXPORT BooleanArray : public PrimitiveArray { using PrimitiveArray::PrimitiveArray; - BooleanArray(int32_t length, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + BooleanArray(int64_t length, const std::shared_ptr& data, + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; - bool Value(int i) const { + bool Value(int64_t i) const { return BitUtil::GetBit(reinterpret_cast(raw_data_), i + offset_); } }; @@ -258,10 +258,10 @@ class ARROW_EXPORT ListArray : public Array { public: using TypeClass = ListType; - ListArray(const std::shared_ptr& type, int32_t length, + ListArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& values, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0) + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0) : Array(type, length, null_bitmap, null_count, offset) { value_offsets_ = value_offsets; raw_value_offsets_ = value_offsets == nullptr @@ -285,15 +285,15 @@ class ARROW_EXPORT ListArray : public Array { const int32_t* raw_value_offsets() const { return raw_value_offsets_ + offset_; } // Neither of these functions will perform boundschecking - int32_t value_offset(int i) const { return raw_value_offsets_[i + offset_]; } - int32_t value_length(int i) const { + int32_t value_offset(int64_t i) const { return raw_value_offsets_[i + offset_]; } + int32_t value_length(int64_t i) const { i += offset_; return raw_value_offsets_[i + 1] - raw_value_offsets_[i]; } Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; protected: std::shared_ptr value_offsets_; @@ -308,15 +308,15 @@ class ARROW_EXPORT BinaryArray : public Array { public: using TypeClass = BinaryType; - BinaryArray(int32_t length, const std::shared_ptr& value_offsets, + BinaryArray(int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); // Return the pointer to the given elements bytes // TODO(emkornfield) introduce a StringPiece or something similar to capture zero-copy // pointer + offset - const uint8_t* GetValue(int i, int32_t* out_length) const { + const uint8_t* GetValue(int64_t i, int32_t* out_length) const { // Account for base offset i += offset_; @@ -334,8 +334,8 @@ class ARROW_EXPORT BinaryArray : public Array { const int32_t* raw_value_offsets() const { return raw_value_offsets_ + offset_; } // Neither of these functions will perform boundschecking - int32_t value_offset(int i) const { return raw_value_offsets_[i + offset_]; } - int32_t value_length(int i) const { + int32_t value_offset(int64_t i) const { return raw_value_offsets_[i + offset_]; } + int32_t value_length(int64_t i) const { i += offset_; return raw_value_offsets_[i + 1] - raw_value_offsets_[i]; } @@ -344,15 +344,15 @@ class ARROW_EXPORT BinaryArray : public Array { Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; protected: // Constructor that allows sub-classes/builders to propagate there logical type up the // class hierarchy. - BinaryArray(const std::shared_ptr& type, int32_t length, + BinaryArray(const std::shared_ptr& type, int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); std::shared_ptr value_offsets_; const int32_t* raw_value_offsets_; @@ -365,14 +365,14 @@ class ARROW_EXPORT StringArray : public BinaryArray { public: using TypeClass = StringType; - StringArray(int32_t length, const std::shared_ptr& value_offsets, + StringArray(int64_t length, const std::shared_ptr& value_offsets, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); // Construct a std::string // TODO: std::bad_alloc possibility - std::string GetString(int i) const { + std::string GetString(int64_t i) const { int32_t nchars; const uint8_t* str = GetValue(i, &nchars); return std::string(reinterpret_cast(str), nchars); @@ -382,7 +382,7 @@ class ARROW_EXPORT StringArray : public BinaryArray { Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; }; // ---------------------------------------------------------------------- @@ -392,22 +392,22 @@ class ARROW_EXPORT StructArray : public Array { public: using TypeClass = StructType; - StructArray(const std::shared_ptr& type, int32_t length, + StructArray(const std::shared_ptr& type, int64_t length, const std::vector>& children, - std::shared_ptr null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + std::shared_ptr null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); Status Validate() const override; // Return a shared pointer in case the requestor desires to share ownership // with this array. - std::shared_ptr field(int32_t pos) const; + std::shared_ptr field(int pos) const; const std::vector>& fields() const { return children_; } Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; protected: // The child arrays corresponding to each field of the struct data type. @@ -422,12 +422,12 @@ class ARROW_EXPORT UnionArray : public Array { using TypeClass = UnionType; using type_id_t = uint8_t; - UnionArray(const std::shared_ptr& type, int32_t length, + UnionArray(const std::shared_ptr& type, int64_t length, const std::vector>& children, const std::shared_ptr& type_ids, const std::shared_ptr& value_offsets = nullptr, - const std::shared_ptr& null_bitmap = nullptr, int32_t null_count = 0, - int32_t offset = 0); + const std::shared_ptr& null_bitmap = nullptr, int64_t null_count = 0, + int64_t offset = 0); Status Validate() const override; @@ -442,13 +442,13 @@ class ARROW_EXPORT UnionArray : public Array { UnionMode mode() const { return static_cast(*type_.get()).mode; } - std::shared_ptr child(int32_t pos) const; + std::shared_ptr child(int pos) const; const std::vector>& children() const { return children_; } Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; protected: std::vector> children_; @@ -487,9 +487,9 @@ class ARROW_EXPORT DictionaryArray : public Array { // Alternate ctor; other attributes (like null count) are inherited from the // passed indices array - static Status FromBuffer(const std::shared_ptr& type, int32_t length, + static Status FromBuffer(const std::shared_ptr& type, int64_t length, const std::shared_ptr& indices, const std::shared_ptr& null_bitmap, - int32_t null_count, int32_t offset, std::shared_ptr* out); + int64_t null_count, int64_t offset, std::shared_ptr* out); Status Validate() const override; @@ -500,7 +500,7 @@ class ARROW_EXPORT DictionaryArray : public Array { Status Accept(ArrayVisitor* visitor) const override; - std::shared_ptr Slice(int32_t offset, int32_t length) const override; + std::shared_ptr Slice(int64_t offset, int64_t length) const override; protected: const DictionaryType* dict_type_; @@ -542,8 +542,8 @@ extern template class ARROW_EXPORT NumericArray; // Create new arrays for logical types that are backed by primitive arrays. Status ARROW_EXPORT MakePrimitiveArray(const std::shared_ptr& type, - int32_t length, const std::shared_ptr& data, - const std::shared_ptr& null_bitmap, int32_t null_count, int32_t offset, + int64_t length, const std::shared_ptr& data, + const std::shared_ptr& null_bitmap, int64_t null_count, int64_t offset, std::shared_ptr* out); } // namespace arrow http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/buffer.h ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/buffer.h b/cpp/src/arrow/buffer.h index 9c400b1..be91af3 100644 --- a/cpp/src/arrow/buffer.h +++ b/cpp/src/arrow/buffer.h @@ -165,7 +165,7 @@ class ARROW_EXPORT BufferBuilder { : pool_(pool), data_(nullptr), capacity_(0), size_(0) {} /// Resizes the buffer to the nearest multiple of 64 bytes per Layout.md - Status Resize(int32_t elements) { + Status Resize(int64_t elements) { if (capacity_ == 0) { buffer_ = std::make_shared(pool_); } RETURN_NOT_OK(buffer_->Resize(elements)); capacity_ = buffer_->capacity(); @@ -173,7 +173,7 @@ class ARROW_EXPORT BufferBuilder { return Status::OK(); } - Status Append(const uint8_t* data, int length) { + Status Append(const uint8_t* data, int64_t length) { if (capacity_ < length + size_) { RETURN_NOT_OK(Resize(length + size_)); } UnsafeAppend(data, length); return Status::OK(); @@ -187,7 +187,7 @@ class ARROW_EXPORT BufferBuilder { } template - Status Append(const T* arithmetic_values, int num_elements) { + Status Append(const T* arithmetic_values, int64_t num_elements) { static_assert(std::is_arithmetic::value, "Convenience buffer append only supports arithmetic types"); return Append( @@ -195,7 +195,7 @@ class ARROW_EXPORT BufferBuilder { } // Unsafe methods don't check existing size - void UnsafeAppend(const uint8_t* data, int length) { + void UnsafeAppend(const uint8_t* data, int64_t length) { memcpy(data_ + size_, data, length); size_ += length; } @@ -208,7 +208,7 @@ class ARROW_EXPORT BufferBuilder { } template - void UnsafeAppend(const T* arithmetic_values, int num_elements) { + void UnsafeAppend(const T* arithmetic_values, int64_t num_elements) { static_assert(std::is_arithmetic::value, "Convenience buffer append only supports arithmetic types"); UnsafeAppend( @@ -221,8 +221,8 @@ class ARROW_EXPORT BufferBuilder { capacity_ = size_ = 0; return result; } - int capacity() { return capacity_; } - int length() { return size_; } + int64_t capacity() { return capacity_; } + int64_t length() { return size_; } private: std::shared_ptr buffer_; http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/builder.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/builder.cc b/cpp/src/arrow/builder.cc index f5c13f9..63e083e 100644 --- a/cpp/src/arrow/builder.cc +++ b/cpp/src/arrow/builder.cc @@ -43,33 +43,33 @@ Status ArrayBuilder::AppendToBitmap(bool is_valid) { return Status::OK(); } -Status ArrayBuilder::AppendToBitmap(const uint8_t* valid_bytes, int32_t length) { +Status ArrayBuilder::AppendToBitmap(const uint8_t* valid_bytes, int64_t length) { RETURN_NOT_OK(Reserve(length)); UnsafeAppendToBitmap(valid_bytes, length); return Status::OK(); } -Status ArrayBuilder::Init(int32_t capacity) { - int32_t to_alloc = BitUtil::CeilByte(capacity) / 8; +Status ArrayBuilder::Init(int64_t capacity) { + int64_t to_alloc = BitUtil::CeilByte(capacity) / 8; null_bitmap_ = std::make_shared(pool_); RETURN_NOT_OK(null_bitmap_->Resize(to_alloc)); // Buffers might allocate more then necessary to satisfy padding requirements - const int byte_capacity = null_bitmap_->capacity(); + const int64_t byte_capacity = null_bitmap_->capacity(); capacity_ = capacity; null_bitmap_data_ = null_bitmap_->mutable_data(); memset(null_bitmap_data_, 0, byte_capacity); return Status::OK(); } -Status ArrayBuilder::Resize(int32_t new_bits) { +Status ArrayBuilder::Resize(int64_t new_bits) { if (!null_bitmap_) { return Init(new_bits); } - int32_t new_bytes = BitUtil::CeilByte(new_bits) / 8; - int32_t old_bytes = null_bitmap_->size(); + int64_t new_bytes = BitUtil::CeilByte(new_bits) / 8; + int64_t old_bytes = null_bitmap_->size(); RETURN_NOT_OK(null_bitmap_->Resize(new_bytes)); null_bitmap_data_ = null_bitmap_->mutable_data(); // The buffer might be overpadded to deal with padding according to the spec - const int32_t byte_capacity = null_bitmap_->capacity(); + const int64_t byte_capacity = null_bitmap_->capacity(); capacity_ = new_bits; if (old_bytes < new_bytes) { memset(null_bitmap_data_ + old_bytes, 0, byte_capacity - old_bytes); @@ -77,7 +77,7 @@ Status ArrayBuilder::Resize(int32_t new_bits) { return Status::OK(); } -Status ArrayBuilder::Advance(int32_t elements) { +Status ArrayBuilder::Advance(int64_t elements) { if (length_ + elements > capacity_) { return Status::Invalid("Builder must be expanded"); } @@ -85,16 +85,16 @@ Status ArrayBuilder::Advance(int32_t elements) { return Status::OK(); } -Status ArrayBuilder::Reserve(int32_t elements) { +Status ArrayBuilder::Reserve(int64_t elements) { if (length_ + elements > capacity_) { // TODO(emkornfield) power of 2 growth is potentially suboptimal - int32_t new_capacity = BitUtil::NextPower2(length_ + elements); + int64_t new_capacity = BitUtil::NextPower2(length_ + elements); return Resize(new_capacity); } return Status::OK(); } -Status ArrayBuilder::SetNotNull(int32_t length) { +Status ArrayBuilder::SetNotNull(int64_t length) { RETURN_NOT_OK(Reserve(length)); UnsafeSetNotNull(length); return Status::OK(); @@ -109,21 +109,21 @@ void ArrayBuilder::UnsafeAppendToBitmap(bool is_valid) { ++length_; } -void ArrayBuilder::UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t length) { +void ArrayBuilder::UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t length) { if (valid_bytes == nullptr) { UnsafeSetNotNull(length); return; } - int byte_offset = length_ / 8; - int bit_offset = length_ % 8; + int64_t byte_offset = length_ / 8; + int64_t bit_offset = length_ % 8; uint8_t bitset = null_bitmap_data_[byte_offset]; - for (int32_t i = 0; i < length; ++i) { + for (int64_t i = 0; i < length; ++i) { if (valid_bytes[i]) { - bitset |= (1 << bit_offset); + bitset |= BitUtil::kBitmask[bit_offset]; } else { - bitset &= ~(1 << bit_offset); + bitset &= BitUtil::kFlippedBitmask[bit_offset]; ++null_count_; } @@ -140,22 +140,22 @@ void ArrayBuilder::UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t leng length_ += length; } -void ArrayBuilder::UnsafeSetNotNull(int32_t length) { - const int32_t new_length = length + length_; +void ArrayBuilder::UnsafeSetNotNull(int64_t length) { + const int64_t new_length = length + length_; // Fill up the bytes until we have a byte alignment - int32_t pad_to_byte = 8 - (length_ % 8); + int64_t pad_to_byte = 8 - (length_ % 8); if (pad_to_byte == 8) { pad_to_byte = 0; } - for (int32_t i = 0; i < pad_to_byte; ++i) { + for (int64_t i = 0; i < pad_to_byte; ++i) { BitUtil::SetBit(null_bitmap_data_, i); } // Fast bitsetting - int32_t fast_length = (length - pad_to_byte) / 8; + int64_t fast_length = (length - pad_to_byte) / 8; memset(null_bitmap_data_ + ((length_ + pad_to_byte) / 8), 255, fast_length); // Trailing bytes - for (int32_t i = length_ + pad_to_byte + (fast_length * 8); i < new_length; ++i) { + for (int64_t i = length_ + pad_to_byte + (fast_length * 8); i < new_length; ++i) { BitUtil::SetBit(null_bitmap_data_, i); } @@ -163,7 +163,7 @@ void ArrayBuilder::UnsafeSetNotNull(int32_t length) { } template -Status PrimitiveBuilder::Init(int32_t capacity) { +Status PrimitiveBuilder::Init(int64_t capacity) { RETURN_NOT_OK(ArrayBuilder::Init(capacity)); data_ = std::make_shared(pool_); @@ -177,7 +177,7 @@ Status PrimitiveBuilder::Init(int32_t capacity) { } template -Status PrimitiveBuilder::Resize(int32_t capacity) { +Status PrimitiveBuilder::Resize(int64_t capacity) { // XXX: Set floor size for now if (capacity < kMinBuilderCapacity) { capacity = kMinBuilderCapacity; } @@ -197,11 +197,12 @@ Status PrimitiveBuilder::Resize(int32_t capacity) { template Status PrimitiveBuilder::Append( - const value_type* values, int32_t length, const uint8_t* valid_bytes) { + const value_type* values, int64_t length, const uint8_t* valid_bytes) { RETURN_NOT_OK(Reserve(length)); if (length > 0) { - memcpy(raw_data_ + length_, values, TypeTraits::bytes_required(length)); + std::memcpy(raw_data_ + length_, values, + static_cast(TypeTraits::bytes_required(length))); } // length_ is update by these @@ -248,7 +249,7 @@ BooleanBuilder::BooleanBuilder(MemoryPool* pool, const std::shared_ptr DCHECK_EQ(Type::BOOL, type->type); } -Status BooleanBuilder::Init(int32_t capacity) { +Status BooleanBuilder::Init(int64_t capacity) { RETURN_NOT_OK(ArrayBuilder::Init(capacity)); data_ = std::make_shared(pool_); @@ -261,7 +262,7 @@ Status BooleanBuilder::Init(int32_t capacity) { return Status::OK(); } -Status BooleanBuilder::Resize(int32_t capacity) { +Status BooleanBuilder::Resize(int64_t capacity) { // XXX: Set floor size for now if (capacity < kMinBuilderCapacity) { capacity = kMinBuilderCapacity; } @@ -294,10 +295,10 @@ Status BooleanBuilder::Finish(std::shared_ptr* out) { } Status BooleanBuilder::Append( - const uint8_t* values, int32_t length, const uint8_t* valid_bytes) { + const uint8_t* values, int64_t length, const uint8_t* valid_bytes) { RETURN_NOT_OK(Reserve(length)); - for (int i = 0; i < length; ++i) { + for (int64_t i = 0; i < length; ++i) { // Skip reading from unitialised memory // TODO: This actually is only to keep valgrind happy but may or may not // have a performance impact. @@ -333,17 +334,17 @@ ListBuilder::ListBuilder( offset_builder_(pool), values_(values) {} -Status ListBuilder::Init(int32_t elements) { - DCHECK_LT(elements, std::numeric_limits::max()); +Status ListBuilder::Init(int64_t elements) { + DCHECK_LT(elements, std::numeric_limits::max()); RETURN_NOT_OK(ArrayBuilder::Init(elements)); // one more then requested for offsets - return offset_builder_.Resize((elements + 1) * sizeof(int32_t)); + return offset_builder_.Resize((elements + 1) * sizeof(int64_t)); } -Status ListBuilder::Resize(int32_t capacity) { - DCHECK_LT(capacity, std::numeric_limits::max()); +Status ListBuilder::Resize(int64_t capacity) { + DCHECK_LT(capacity, std::numeric_limits::max()); // one more then requested for offsets - RETURN_NOT_OK(offset_builder_.Resize((capacity + 1) * sizeof(int32_t))); + RETURN_NOT_OK(offset_builder_.Resize((capacity + 1) * sizeof(int64_t))); return ArrayBuilder::Resize(capacity); } @@ -351,7 +352,7 @@ Status ListBuilder::Finish(std::shared_ptr* out) { std::shared_ptr items = values_; if (!items) { RETURN_NOT_OK(value_builder_->Finish(&items)); } - RETURN_NOT_OK(offset_builder_.Append(items->length())); + RETURN_NOT_OK(offset_builder_.Append(items->length())); std::shared_ptr offsets = offset_builder_.Finish(); *out = std::make_shared( http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/builder.h ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/builder.h b/cpp/src/arrow/builder.h index 0b83b9f..e642d3c 100644 --- a/cpp/src/arrow/builder.h +++ b/cpp/src/arrow/builder.h @@ -37,7 +37,7 @@ namespace arrow { class Array; -static constexpr int32_t kMinBuilderCapacity = 1 << 5; +static constexpr int64_t kMinBuilderCapacity = 1 << 5; /// Base class for all data array builders. // @@ -61,38 +61,38 @@ class ARROW_EXPORT ArrayBuilder { /// skip shared pointers and just return a raw pointer ArrayBuilder* child(int i) { return children_[i].get(); } - int num_children() const { return children_.size(); } + int num_children() const { return static_cast(children_.size()); } - int32_t length() const { return length_; } - int32_t null_count() const { return null_count_; } - int32_t capacity() const { return capacity_; } + int64_t length() const { return length_; } + int64_t null_count() const { return null_count_; } + int64_t capacity() const { return capacity_; } /// Append to null bitmap Status AppendToBitmap(bool is_valid); /// Vector append. Treat each zero byte as a null. If valid_bytes is null /// assume all of length bits are valid. - Status AppendToBitmap(const uint8_t* valid_bytes, int32_t length); + Status AppendToBitmap(const uint8_t* valid_bytes, int64_t length); /// Set the next length bits to not null (i.e. valid). - Status SetNotNull(int32_t length); + Status SetNotNull(int64_t length); /// Allocates initial capacity requirements for the builder. In most /// cases subclasses should override and call there parent classes /// method as well. - virtual Status Init(int32_t capacity); + virtual Status Init(int64_t capacity); /// Resizes the null_bitmap array. In most /// cases subclasses should override and call there parent classes /// method as well. - virtual Status Resize(int32_t new_bits); + virtual Status Resize(int64_t new_bits); /// Ensures there is enough space for adding the number of elements by checking /// capacity and calling Resize if necessary. - Status Reserve(int32_t elements); + Status Reserve(int64_t elements); /// For cases where raw data was memcpy'd into the internal buffers, allows us /// to advance the length of the builder. It is your responsibility to use /// this function responsibly. - Status Advance(int32_t elements); + Status Advance(int64_t elements); std::shared_ptr null_bitmap() const { return null_bitmap_; } @@ -109,12 +109,12 @@ class ARROW_EXPORT ArrayBuilder { // When null_bitmap are first appended to the builder, the null bitmap is allocated std::shared_ptr null_bitmap_; - int32_t null_count_; + int64_t null_count_; uint8_t* null_bitmap_data_; // Array length, so far. Also, the index of the next element to be added - int32_t length_; - int32_t capacity_; + int64_t length_; + int64_t capacity_; // Child value array builders. These are owned by this class std::vector> children_; @@ -127,9 +127,9 @@ class ARROW_EXPORT ArrayBuilder { void UnsafeAppendToBitmap(bool is_valid); // Vector append. Treat each zero byte as a nullzero. If valid_bytes is null // assume all of length bits are valid. - void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int32_t length); + void UnsafeAppendToBitmap(const uint8_t* valid_bytes, int64_t length); // Set the next length bits to not null (i.e. valid). - void UnsafeSetNotNull(int32_t length); + void UnsafeSetNotNull(int64_t length); private: DISALLOW_COPY_AND_ASSIGN(ArrayBuilder); @@ -146,7 +146,7 @@ class ARROW_EXPORT PrimitiveBuilder : public ArrayBuilder { using ArrayBuilder::Advance; /// Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory - Status AppendNulls(const uint8_t* valid_bytes, int32_t length) { + Status AppendNulls(const uint8_t* valid_bytes, int64_t length) { RETURN_NOT_OK(Reserve(length)); UnsafeAppendToBitmap(valid_bytes, length); return Status::OK(); @@ -165,14 +165,14 @@ class ARROW_EXPORT PrimitiveBuilder : public ArrayBuilder { /// If passed, valid_bytes is of equal length to values, and any zero byte /// will be considered as a null for that slot Status Append( - const value_type* values, int32_t length, const uint8_t* valid_bytes = nullptr); + const value_type* values, int64_t length, const uint8_t* valid_bytes = nullptr); Status Finish(std::shared_ptr* out) override; - Status Init(int32_t capacity) override; + Status Init(int64_t capacity) override; /// Increase the capacity of the builder to accommodate at least the indicated /// number of elements - Status Resize(int32_t capacity) override; + Status Resize(int64_t capacity) override; protected: std::shared_ptr data_; @@ -246,7 +246,7 @@ class ARROW_EXPORT BooleanBuilder : public ArrayBuilder { using ArrayBuilder::Advance; /// Write nulls as uint8_t* (0 value indicates null) into pre-allocated memory - Status AppendNulls(const uint8_t* valid_bytes, int32_t length) { + Status AppendNulls(const uint8_t* valid_bytes, int64_t length) { RETURN_NOT_OK(Reserve(length)); UnsafeAppendToBitmap(valid_bytes, length); return Status::OK(); @@ -278,14 +278,14 @@ class ARROW_EXPORT BooleanBuilder : public ArrayBuilder { /// If passed, valid_bytes is of equal length to values, and any zero byte /// will be considered as a null for that slot Status Append( - const uint8_t* values, int32_t length, const uint8_t* valid_bytes = nullptr); + const uint8_t* values, int64_t length, const uint8_t* valid_bytes = nullptr); Status Finish(std::shared_ptr* out) override; - Status Init(int32_t capacity) override; + Status Init(int64_t capacity) override; /// Increase the capacity of the builder to accommodate at least the indicated /// number of elements - Status Resize(int32_t capacity) override; + Status Resize(int64_t capacity) override; protected: std::shared_ptr data_; @@ -318,8 +318,8 @@ class ARROW_EXPORT ListBuilder : public ArrayBuilder { ListBuilder( MemoryPool* pool, std::shared_ptr values, const TypePtr& type = nullptr); - Status Init(int32_t elements) override; - Status Resize(int32_t capacity) override; + Status Init(int64_t elements) override; + Status Resize(int64_t capacity) override; Status Finish(std::shared_ptr* out) override; /// Vector append @@ -327,7 +327,7 @@ class ARROW_EXPORT ListBuilder : public ArrayBuilder { /// If passed, valid_bytes is of equal length to values, and any zero byte /// will be considered as a null for that slot Status Append( - const int32_t* offsets, int32_t length, const uint8_t* valid_bytes = nullptr) { + const int32_t* offsets, int64_t length, const uint8_t* valid_bytes = nullptr) { RETURN_NOT_OK(Reserve(length)); UnsafeAppendToBitmap(valid_bytes, length); offset_builder_.UnsafeAppend(offsets, length); @@ -341,7 +341,8 @@ class ARROW_EXPORT ListBuilder : public ArrayBuilder { Status Append(bool is_valid = true) { RETURN_NOT_OK(Reserve(1)); UnsafeAppendToBitmap(is_valid); - RETURN_NOT_OK(offset_builder_.Append(value_builder_->length())); + RETURN_NOT_OK( + offset_builder_.Append(static_cast(value_builder_->length()))); return Status::OK(); } @@ -375,7 +376,9 @@ class ARROW_EXPORT BinaryBuilder : public ListBuilder { return Append(reinterpret_cast(value), length); } - Status Append(const std::string& value) { return Append(value.c_str(), value.size()); } + Status Append(const std::string& value) { + return Append(value.c_str(), static_cast(value.size())); + } Status Finish(std::shared_ptr* out) override; @@ -417,7 +420,7 @@ class ARROW_EXPORT StructBuilder : public ArrayBuilder { /// will be considered as a null for that field, but users must using app- /// end methods or advance methods of the child builders' independently to /// insert data. - Status Append(int32_t length, const uint8_t* valid_bytes) { + Status Append(int64_t length, const uint8_t* valid_bytes) { RETURN_NOT_OK(Reserve(length)); UnsafeAppendToBitmap(valid_bytes, length); return Status::OK(); http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/column-benchmark.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/column-benchmark.cc b/cpp/src/arrow/column-benchmark.cc index 1bab5a8..13076a4 100644 --- a/cpp/src/arrow/column-benchmark.cc +++ b/cpp/src/arrow/column-benchmark.cc @@ -24,7 +24,7 @@ namespace arrow { namespace { template -std::shared_ptr MakePrimitive(int32_t length, int32_t null_count = 0) { +std::shared_ptr MakePrimitive(int64_t length, int64_t null_count = 0) { auto pool = default_memory_pool(); auto data = std::make_shared(pool); auto null_bitmap = std::make_shared(pool); http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/column.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/column.cc b/cpp/src/arrow/column.cc index 1376f65..1822870 100644 --- a/cpp/src/arrow/column.cc +++ b/cpp/src/arrow/column.cc @@ -42,15 +42,15 @@ bool ChunkedArray::Equals(const ChunkedArray& other) const { // Check contents of the underlying arrays. This checks for equality of // the underlying data independently of the chunk size. int this_chunk_idx = 0; - int32_t this_start_idx = 0; + int64_t this_start_idx = 0; int other_chunk_idx = 0; - int32_t other_start_idx = 0; + int64_t other_start_idx = 0; int64_t elements_compared = 0; while (elements_compared < length_) { const std::shared_ptr this_array = chunks_[this_chunk_idx]; const std::shared_ptr other_array = other.chunk(other_chunk_idx); - int32_t common_length = std::min( + int64_t common_length = std::min( this_array->length() - this_start_idx, other_array->length() - other_start_idx); if (!this_array->RangeEquals(this_start_idx, this_start_idx + common_length, other_start_idx, other_array)) { http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/column.h ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/column.h b/cpp/src/arrow/column.h index a28b266..93a34c7 100644 --- a/cpp/src/arrow/column.h +++ b/cpp/src/arrow/column.h @@ -44,7 +44,7 @@ class ARROW_EXPORT ChunkedArray { int64_t null_count() const { return null_count_; } - int num_chunks() const { return chunks_.size(); } + int num_chunks() const { return static_cast(chunks_.size()); } std::shared_ptr chunk(int i) const { return chunks_[i]; } http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/compare.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/compare.cc b/cpp/src/arrow/compare.cc index ff3c59f..e94fa74 100644 --- a/cpp/src/arrow/compare.cc +++ b/cpp/src/arrow/compare.cc @@ -37,8 +37,8 @@ namespace arrow { class RangeEqualsVisitor : public ArrayVisitor { public: - RangeEqualsVisitor(const Array& right, int32_t left_start_idx, int32_t left_end_idx, - int32_t right_start_idx) + RangeEqualsVisitor(const Array& right, int64_t left_start_idx, int64_t left_end_idx, + int64_t right_start_idx) : right_(right), left_start_idx_(left_start_idx), left_end_idx_(left_end_idx), @@ -55,7 +55,7 @@ class RangeEqualsVisitor : public ArrayVisitor { inline Status CompareValues(const ArrayType& left) { const auto& right = static_cast(right_); - for (int32_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; + for (int64_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; ++i, ++o_i) { const bool is_null = left.IsNull(i); if (is_null != right.IsNull(o_i) || @@ -71,7 +71,7 @@ class RangeEqualsVisitor : public ArrayVisitor { bool CompareBinaryRange(const BinaryArray& left) const { const auto& right = static_cast(right_); - for (int32_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; + for (int64_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; ++i, ++o_i) { const bool is_null = left.IsNull(i); if (is_null != right.IsNull(o_i)) { return false; } @@ -164,7 +164,7 @@ class RangeEqualsVisitor : public ArrayVisitor { const std::shared_ptr& left_values = left.values(); const std::shared_ptr& right_values = right.values(); - for (int32_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; + for (int64_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; ++i, ++o_i) { const bool is_null = left.IsNull(i); if (is_null != right.IsNull(o_i)) { return false; } @@ -193,15 +193,15 @@ class RangeEqualsVisitor : public ArrayVisitor { bool CompareStructs(const StructArray& left) { const auto& right = static_cast(right_); bool equal_fields = true; - for (int32_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; + for (int64_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; ++i, ++o_i) { if (left.IsNull(i) != right.IsNull(o_i)) { return false; } if (left.IsNull(i)) continue; - for (size_t j = 0; j < left.fields().size(); ++j) { + for (int j = 0; j < static_cast(left.fields().size()); ++j) { // TODO: really we should be comparing stretches of non-null data rather // than looking at one value at a time. - const int left_abs_index = i + left.offset(); - const int right_abs_index = o_i + right.offset(); + const int64_t left_abs_index = i + left.offset(); + const int64_t right_abs_index = o_i + right.offset(); equal_fields = left.field(j)->RangeEquals( left_abs_index, left_abs_index + 1, right_abs_index, right.field(j)); @@ -243,7 +243,7 @@ class RangeEqualsVisitor : public ArrayVisitor { const uint8_t* right_ids = right.raw_type_ids(); uint8_t id, child_num; - for (int32_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; + for (int64_t i = left_start_idx_, o_i = right_start_idx_; i < left_end_idx_; ++i, ++o_i) { if (left.IsNull(i) != right.IsNull(o_i)) { return false; } if (left.IsNull(i)) continue; @@ -252,8 +252,8 @@ class RangeEqualsVisitor : public ArrayVisitor { id = left_ids[i]; child_num = type_id_to_child_num[id]; - const int left_abs_index = i + left.offset(); - const int right_abs_index = o_i + right.offset(); + const int64_t left_abs_index = i + left.offset(); + const int64_t right_abs_index = o_i + right.offset(); // TODO(wesm): really we should be comparing stretches of non-null data // rather than looking at one value at a time. @@ -294,9 +294,9 @@ class RangeEqualsVisitor : public ArrayVisitor { protected: const Array& right_; - int32_t left_start_idx_; - int32_t left_end_idx_; - int32_t right_start_idx_; + int64_t left_start_idx_; + int64_t left_end_idx_; + int64_t right_start_idx_; bool result_; }; @@ -314,7 +314,7 @@ class ArrayEqualsVisitor : public RangeEqualsVisitor { const uint8_t* left_data = left.data()->data(); const uint8_t* right_data = right.data()->data(); - for (int i = 0; i < left.length(); ++i) { + for (int64_t i = 0; i < left.length(); ++i) { if (!left.IsNull(i) && BitUtil::GetBit(left_data, i) != BitUtil::GetBit(right_data, i)) { result_ = false; @@ -339,7 +339,7 @@ class ArrayEqualsVisitor : public RangeEqualsVisitor { const uint8_t* right_data = right.data()->data() + right.offset() * value_byte_size; if (left.null_count() > 0) { - for (int i = 0; i < left.length(); ++i) { + for (int64_t i = 0; i < left.length(); ++i) { if (!left.IsNull(i) && memcmp(left_data, right_data, value_byte_size)) { return false; } @@ -401,7 +401,7 @@ class ArrayEqualsVisitor : public RangeEqualsVisitor { reinterpret_cast(right.value_offsets()->data()) + right.offset(); - for (int32_t i = 0; i < left.length() + 1; ++i) { + for (int64_t i = 0; i < left.length() + 1; ++i) { if (left_offsets[i] - left_offsets[0] != right_offsets[i] - right_offsets[0]) { return false; } @@ -437,7 +437,7 @@ class ArrayEqualsVisitor : public RangeEqualsVisitor { // ARROW-537: Only compare data in non-null slots const int32_t* left_offsets = left.raw_value_offsets(); const int32_t* right_offsets = right.raw_value_offsets(); - for (int32_t i = 0; i < left.length(); ++i) { + for (int64_t i = 0; i < left.length(); ++i) { if (left.IsNull(i)) { continue; } if (std::memcmp(left_data + left_offsets[i], right_data + right_offsets[i], left.value_length(i))) { @@ -496,15 +496,15 @@ inline bool FloatingApproxEquals( const T* left_data = left.raw_data(); const T* right_data = right.raw_data(); - static constexpr T EPSILON = 1E-5; + static constexpr T EPSILON = static_cast(1E-5); if (left.null_count() > 0) { - for (int32_t i = 0; i < left.length(); ++i) { + for (int64_t i = 0; i < left.length(); ++i) { if (left.IsNull(i)) continue; if (fabs(left_data[i] - right_data[i]) > EPSILON) { return false; } } } else { - for (int32_t i = 0; i < left.length(); ++i) { + for (int64_t i = 0; i < left.length(); ++i) { if (fabs(left_data[i] - right_data[i]) > EPSILON) { return false; } } } @@ -556,8 +556,8 @@ Status ArrayEquals(const Array& left, const Array& right, bool* are_equal) { return Status::OK(); } -Status ArrayRangeEquals(const Array& left, const Array& right, int32_t left_start_idx, - int32_t left_end_idx, int32_t right_start_idx, bool* are_equal) { +Status ArrayRangeEquals(const Array& left, const Array& right, int64_t left_start_idx, + int64_t left_end_idx, int64_t right_start_idx, bool* are_equal) { if (&left == &right) { *are_equal = true; } else if (left.type_enum() != right.type_enum()) { http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/compare.h ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/compare.h b/cpp/src/arrow/compare.h index 6a71f9f..1ddf049 100644 --- a/cpp/src/arrow/compare.h +++ b/cpp/src/arrow/compare.h @@ -40,7 +40,7 @@ Status ARROW_EXPORT ArrayApproxEquals( /// Returns true if indicated equal-length segment of arrays is exactly equal Status ARROW_EXPORT ArrayRangeEquals(const Array& left, const Array& right, - int32_t start_idx, int32_t end_idx, int32_t other_start_idx, bool* are_equal); + int64_t start_idx, int64_t end_idx, int64_t other_start_idx, bool* are_equal); /// Returns true if the type metadata are exactly equal Status ARROW_EXPORT TypeEquals( http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/io/file.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/io/file.cc b/cpp/src/arrow/io/file.cc index c1f0ea4..230c7fe 100644 --- a/cpp/src/arrow/io/file.cc +++ b/cpp/src/arrow/io/file.cc @@ -263,9 +263,9 @@ static inline Status FileWrite(int fd, const uint8_t* buffer, int64_t nbytes) { if (nbytes > INT32_MAX) { return Status::IOError("Unable to write > 2GB blocks to file yet"); } - ret = _write(fd, buffer, static_cast(nbytes)); + ret = static_cast(_write(fd, buffer, static_cast(nbytes))); #else - ret = write(fd, buffer, nbytes); + ret = static_cast(write(fd, buffer, nbytes)); #endif if (ret == -1) { @@ -303,9 +303,9 @@ static inline Status FileClose(int fd) { int ret; #if defined(_MSC_VER) - ret = _close(fd); + ret = static_cast(_close(fd)); #else - ret = close(fd); + ret = static_cast(close(fd)); #endif if (ret == -1) { return Status::IOError("error closing file"); } http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/io/hdfs.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/io/hdfs.cc b/cpp/src/arrow/io/hdfs.cc index 5682f44..408b85f 100644 --- a/cpp/src/arrow/io/hdfs.cc +++ b/cpp/src/arrow/io/hdfs.cc @@ -114,7 +114,7 @@ class HdfsReadableFile::HdfsReadableFileImpl : public HdfsAnyFileImpl { tSize ret; if (driver_->HasPread()) { ret = driver_->Pread(fs_, file_, static_cast(position), - reinterpret_cast(buffer), nbytes); + reinterpret_cast(buffer), static_cast(nbytes)); } else { RETURN_NOT_OK(Seek(position)); return Read(nbytes, bytes_read, buffer); @@ -141,7 +141,7 @@ class HdfsReadableFile::HdfsReadableFileImpl : public HdfsAnyFileImpl { int64_t total_bytes = 0; while (total_bytes < nbytes) { tSize ret = driver_->Read(fs_, file_, reinterpret_cast(buffer + total_bytes), - std::min(buffer_size_, nbytes - total_bytes)); + static_cast(std::min(buffer_size_, nbytes - total_bytes))); RETURN_NOT_OK(CheckReadResult(ret)); total_bytes += ret; if (ret == 0) { break; } @@ -253,7 +253,8 @@ class HdfsOutputStream::HdfsOutputStreamImpl : public HdfsAnyFileImpl { } Status Write(const uint8_t* buffer, int64_t nbytes, int64_t* bytes_written) { - tSize ret = driver_->Write(fs_, file_, reinterpret_cast(buffer), nbytes); + tSize ret = driver_->Write( + fs_, file_, reinterpret_cast(buffer), static_cast(nbytes)); CHECK_FAILURE(ret, "Write"); *bytes_written = ret; return Status::OK(); @@ -328,7 +329,7 @@ class HdfsClient::HdfsClientImpl { if (!config->host.empty()) { driver_->BuilderSetNameNode(builder, config->host.c_str()); } - driver_->BuilderSetNameNodePort(builder, config->port); + driver_->BuilderSetNameNodePort(builder, static_cast(config->port)); if (!config->user.empty()) { driver_->BuilderSetUserName(builder, config->user.c_str()); } @@ -411,7 +412,7 @@ class HdfsClient::HdfsClientImpl { // Allocate additional space for elements - int vec_offset = listing->size(); + int vec_offset = static_cast(listing->size()); listing->resize(vec_offset + num_entries); for (int i = 0; i < num_entries; ++i) { @@ -449,8 +450,8 @@ class HdfsClient::HdfsClientImpl { int flags = O_WRONLY; if (append) flags |= O_APPEND; - hdfsFile handle = driver_->OpenFile( - fs_, path.c_str(), flags, buffer_size, replication, default_block_size); + hdfsFile handle = driver_->OpenFile(fs_, path.c_str(), flags, buffer_size, + replication, static_cast(default_block_size)); if (handle == nullptr) { // TODO(wesm): determine cause of failure http://git-wip-us.apache.org/repos/asf/arrow/blob/01a67f3f/cpp/src/arrow/io/io-hdfs-test.cc ---------------------------------------------------------------------- diff --git a/cpp/src/arrow/io/io-hdfs-test.cc b/cpp/src/arrow/io/io-hdfs-test.cc index f0e5a28..648d4ba 100644 --- a/cpp/src/arrow/io/io-hdfs-test.cc +++ b/cpp/src/arrow/io/io-hdfs-test.cc @@ -49,7 +49,7 @@ class TestHdfsClient : public ::testing::Test { } Status WriteDummyFile(const std::string& path, const uint8_t* buffer, int64_t size, - bool append = false, int buffer_size = 0, int replication = 0, + bool append = false, int buffer_size = 0, int16_t replication = 0, int default_block_size = 0) { std::shared_ptr file; RETURN_NOT_OK(client_->OpenWriteable(