arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wes McKinney (JIRA)" <>
Subject [jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format
Date Tue, 15 Nov 2016 18:33:58 GMT


Wes McKinney commented on ARROW-300:

One issue with doing compression only at the transport level is if people use the Arrow memory
layout and metadata to create file formats for storing larger amounts of data. For example,
I would like to deprecate the Feather metadata
and use only the Arrow metadata. Unless you support column/buffer-level compression, then
it would be expensive to read only a subset of the file. You could argue that such data should
be stored as Parquet instead, but it does offer a flexibility that's really appealing (particularly
since random access on memory-mapped Arrow-like data would be possible). 

> [Format] Add buffer compression option to IPC file format
> ---------------------------------------------------------
>                 Key: ARROW-300
>                 URL:
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Format
>            Reporter: Wes McKinney
> It may be useful if data is to be sent over the wire to compress the data buffers themselves
as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer compression setting
in the file Footer. Probably only two compressors worth supporting out of the box would be
zlib (higher compression ratios) and lz4 (better performance).
> What does everyone think?

This message was sent by Atlassian JIRA

View raw message