arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mayer <>
Subject [Python] Filtering _metadata by file path
Date Sat, 06 Feb 2021 15:57:29 GMT
After writing a _metadata file as done here,
I'm wondering if it is possible to read that _metadata file (e.g. using
pyarrow.parquet.read_metadata), filter out some paths, and write it back to
disk. I can see that file path info is available, e.g.

meta = pq.read_metadata(...)

But I cannot figure out how to filter or create a FileMetaData object
(since that is what the metadata_collector param of
 pyarrow.parquet.write_metadata expects) from either a set of
RowGroupMetaData or ColumnChunkMetaData objects. Is this possible? I'm
trying to avoid needing to reread the FileMetaData from each file in the
dataset directly.

View raw message