arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: write_feather, new_file, and compression
Date Thu, 08 Oct 2020 15:50:52 GMT
On Wed, Oct 7, 2020 at 9:33 PM Jonathan Yu <jonathan.i.yu@gmail.com> wrote:
>
> Hello there,
>
> I am using Arrow to store data on disk temporarily, so disk space is not a problem (I
understand that Parquet is preferable for more efficient disk storage). It seems that Arrow's
memory mapping/zero copy capabilities would provide better performance given this use case.
>
> Here are my questions:
>
> 1. For new applications, should we prefer the pa.ipc.new_file interface over write_feather?
My understanding from reading [0] is that pa.feather.write_feather is an API provided for
backward compatibility, and with compression disabled, it seems to produce files of the same
size (the files appear to be identical) as the RecordBatchFileWriter.
>

You can use either, neither API is deprecated nor planning to be.

> 2. Does compression affect the need to make copies? I imagine that compressing the file
means that the code to use the file cannot be zero-copy anymore.
>

Right, when using compression by definition zero copy is not possible.

> 3. When using pandas to analyze the data, is there a way to load the data using memory
mapping, and if so, would this be expected to improve deserialization performance and memory
utilization if multiple processes are reading the same table data simultaneously? Assume that
I'm running on a modern server-class SSD.
>

No, pandas doesn't support memory mapping.

> Thank you!
>
> Jonathan
>
> [0] https://arrow.apache.org/faq/#what-about-the-feather-file-format

Mime
View raw message