arrow-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alessandro Molina (Jira)" <>
Subject [jira] [Created] (ARROW-12650) [Python] Improve documentation regarding dealing with memory mapped files
Date Tue, 04 May 2021 15:09:00 GMT
Alessandro Molina created ARROW-12650:

             Summary: [Python] Improve documentation regarding dealing with memory mapped
                 Key: ARROW-12650
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Alessandro Molina

While one of the Arrow promises is that it makes easy to read/write data bigger than memory,
it's not immediately obvious from the pyarrow documentation how to deal with memory mapped

We hint that you can open files as memory mapped ( [] )
but then we don't explain how to read/write Arrow Arrays or Tables from there.

While most high level functions to read/write formats (pqt, feather, ...) have an easy to
guess {{memory_map=True}} option, we don't have any example of how that is meant to work for
Arrow format itself. For example how you can do that using {{RecordBatchFile*}}. 

An addition to the memory mapping section that makes a more meaningful example that reads/writes
actual arrow data (instead of plain bytes) would probably be more helpful

This message was sent by Atlassian Jira

View raw message