arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Bowman <Brian.Bow...@sas.com>
Subject Re: [VOTE] Add 64-bit offset list, binary, string (utf8) types to the Arrow columnar format
Date Fri, 26 Apr 2019 15:05:29 GMT
Can non-Arrow PMC members/committers vote?  

If so, +1 

-Brian

´╗┐On 4/25/19, 4:34 PM, "Wes McKinney" <wesmckinn@gmail.com> wrote:

    EXTERNAL
    
    In a recent mailing list discussion [1] Micah Kornfield has proposed
    to add new list and variable-size binary and unicode types to the
    Arrow columnar format with 64-bit signed integer offsets, to be used
    in addition to the existing 32-bit offset varieties. These will be
    implemented as new types in the Type union in Schema.fbs (the
    particular names can be debated in the PR that implements them):
    
    LargeList
    LargeBinary
    LargeString [UTF8]
    
    While very large contiguous columns are not a principle use case for
    the columnar format, it has been observed empirically that there are
    applications that use the format to represent datasets where
    realizations of data can sometimes exceed the 2^31 - 1 "capacity" of a
    column and cannot be easily (or at all) split into smaller chunks.
    
    Please vote whether to accept the changes. The vote will be open for at
    least 72 hours.
    
    [ ] +1 Accept the additions to the columnar format
    [ ] +0
    [ ] -1 Do not accept the changes because...
    
    Thanks,
    Wes
    
    [1]: https://lists.apache.org/thread.html/8088eca21b53906315e2bbc35eb2d246acf10025b5457eccc7a0e8a3@%3Cdev.arrow.apache.org%3E
    

Mime
View raw message