arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Nugent <nug...@gmail.com>
Subject 'Plain' Dataset Python API doesn't memory map?
Date Wed, 29 Apr 2020 23:58:28 GMT
Hi,

I'm trying to use the 0.17 dataset API to map in an arrow table in the
uncompressed feather format (ultimately hoping to work with data larger
than memory). It seems like it reads all the constituent files into memory
before creating the Arrow table object though.

When I use the FeatherDataset API, it does appear to work map the files and
the Table is created based off of mapped data.

Any hints at what I'm doing wrong? I didn't see any options relating to
memory mapping for the general datasets

Here's the code for the plain dataset api call:

    from pyarrow.dataset import dataset as ds
    t = ds('demo', format='feather').read_table()

Here's the code for reading using the FeatherDataset api:

    from pyarrow.feather import FeatherDataset as ds
    from pathlib import Path
    t = ds(list(Path('demo').iterdir())).read_table()

Thanks!

-Dan Nugent

Mime
View raw message