arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antoine Pitrou <anto...@python.org>
Subject Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format
Date Thu, 25 Apr 2019 20:42:01 GMT

+1 (binding)

Regards

Antoine.


Le 25/04/2019 à 22:33, Wes McKinney a écrit :
> In a recent mailing list discussion [1] Micah Kornfield has proposed
> to add new list and variable-size binary and unicode types to the
> Arrow columnar format with 64-bit signed integer offsets, to be used
> in addition to the existing 32-bit offset varieties. These will be
> implemented as new types in the Type union in Schema.fbs (the
> particular names can be debated in the PR that implements them):
> 
> LargeList
> LargeBinary
> LargeString [UTF8]
> 
> While very large contiguous columns are not a principle use case for
> the columnar format, it has been observed empirically that there are
> applications that use the format to represent datasets where
> realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
> column and cannot be easily (or at all) split into smaller chunks.
> 
> Please vote whether to accept the changes. The vote will be open for at
> least 72 hours.
> 
> [ ] +1 Accept the additions to the columnar format
> [ ] +0
> [ ] -1 Do not accept the changes because...
> 
> Thanks,
> Wes
> 
> [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
> 

Mime
View raw message