parquet-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From w...@apache.org
Subject [parquet-cpp] branch master updated: PARQUET-1276: [C++] Reduce the amount of memory used for writing null decimal values
Date Mon, 20 Aug 2018 15:43:02 GMT
This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-cpp.git


The following commit(s) were added to refs/heads/master by this push:
     new da1f0a0  PARQUET-1276: [C++] Reduce the amount of memory used for writing null decimal
values
da1f0a0 is described below

commit da1f0a0f11dc56cadd4b857d530e9b20fc19aa5a
Author: Antoine Pitrou <antoine@python.org>
AuthorDate: Mon Aug 20 11:42:54 2018 -0400

    PARQUET-1276: [C++] Reduce the amount of memory used for writing null decimal values
    
    Initial patch by @cpcloud
    
    Supersedes PR https://github.com/apache/parquet-cpp/pull/459
    
    Author: Antoine Pitrou <antoine@python.org>
    
    Closes #493 from pitrou/PARQUET-1276-arrow-decimal-memory-consumption and squashes the
following commits:
    
    fd2df09 [Antoine Pitrou] PARQUET-1276: [C++] Reduce the amount of memory used for writing
null decimal values
---
 src/parquet/arrow/writer.cc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/parquet/arrow/writer.cc b/src/parquet/arrow/writer.cc
index 412d4e7..b7d139e 100644
--- a/src/parquet/arrow/writer.cc
+++ b/src/parquet/arrow/writer.cc
@@ -824,8 +824,8 @@ Status ArrowColumnWriter::TypedWriteBatch<FLBAType, ::arrow::Decimal128Type>(
   const bool does_not_have_nulls =
       writer_->descr()->schema_node()->is_required() || data.null_count() == 0;
 
-  // TODO(phillipc): This is potentially very wasteful if we have a lot of nulls
-  std::vector<uint64_t> big_endian_values(static_cast<size_t>(length) * 2);
+  const auto valid_value_count = static_cast<size_t>(length - data.null_count()) *
2;
+  std::vector<uint64_t> big_endian_values(valid_value_count);
 
   // TODO(phillipc): Look into whether our compilers will perform loop unswitching so we
   // don't have to keep writing two loops to handle the case where we know there are no


Mime
View raw message