arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Van den Bossche <>
Subject Re: 'Plain' Dataset Python API doesn't memory map?
Date Thu, 30 Apr 2020 07:27:06 GMT
Hi Dan,

Currently, the memory mapping in the Datasets API is controlled by the
filesystem. So to enable memory mapping for feather, you can do:

import pyarrow.dataset as ds
from pyarrow.fs import LocalFileSystem

fs = LocalFileSystem(use_mmap=True)
t = ds.dataset('demo', format='feather', filesystem=fs).to_table()

Can you try if that is working for you?
We should better document this (and there is actually also some discussion
about the best API for this, see,


On Thu, 30 Apr 2020 at 01:58, Daniel Nugent <> wrote:

> Hi,
> I'm trying to use the 0.17 dataset API to map in an arrow table in the
> uncompressed feather format (ultimately hoping to work with data larger
> than memory). It seems like it reads all the constituent files into memory
> before creating the Arrow table object though.
> When I use the FeatherDataset API, it does appear to work map the files
> and the Table is created based off of mapped data.
> Any hints at what I'm doing wrong? I didn't see any options relating to
> memory mapping for the general datasets
> Here's the code for the plain dataset api call:
>     from pyarrow.dataset import dataset as ds
>     t = ds('demo', format='feather').read_table()
> Here's the code for reading using the FeatherDataset api:
>     from pyarrow.feather import FeatherDataset as ds
>     from pathlib import Path
>     t = ds(list(Path('demo').iterdir())).read_table()
> Thanks!
> -Dan Nugent

View raw message