arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bowman <>
Subject Re: Need 64-bit Integer length for Parquet ByteArray Type
Date Fri, 05 Apr 2019 17:43:23 GMT
Hello Ryan,

Looks like it's limited by both the Parquet implementation and the Thrift message methods.
 Am I missing anything?

From cpp/src/parquet/types.h 

struct ByteArray {
  ByteArray() : len(0), ptr(NULLPTR) {}
  ByteArray(uint32_t len, const uint8_t* ptr) : len(len), ptr(ptr) {}
  uint32_t len;
  const uint8_t* ptr;

From cpp/src/parquet/thrift.h

inline void DeserializeThriftMsg(const uint8_t* buf, uint32_t* len, T* deserialized_msg) {
inline int64_t SerializeThriftMsg(T* obj, uint32_t len, OutputStream* out) 


´╗┐On 4/5/19, 1:32 PM, "Ryan Blue" <> wrote:

    Hi Brian,
    This seems like something we should allow. What imposes the current limit?
    Is it in the thrift format, or just the implementations?
    On Fri, Apr 5, 2019 at 10:23 AM Brian Bowman <> wrote:
    > All,
    > SAS requires support for storing varying-length character and binary blobs
    > with a 2^64 max length in Parquet.   Currently, the ByteArray len field is
    > a unint32_t.   Looks this the will require incrementing the Parquet file
    > format version and changing ByteArray len to uint64_t.
    > Have there been any requests for this or other Parquet developments that
    > require file format versioning changes?
    > I realize this a non-trivial ask.  Thanks for considering it.
    > -Brian
    Ryan Blue
    Software Engineer

View raw message