arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Mayer <joshuaama...@gmail.com>
Subject [Python] Filtering _metadata by file path
Date Sat, 06 Feb 2021 15:57:29 GMT
After writing a _metadata file as done here
https://arrow.apache.org/docs/python/parquet.html?highlight=write_metadata#writing-metadata-and-common-medata-files,
I'm wondering if it is possible to read that _metadata file (e.g. using
pyarrow.parquet.read_metadata), filter out some paths, and write it back to
disk. I can see that file path info is available, e.g.

meta = pq.read_metadata(...)
meta.row_group(0).column(0).file_path

But I cannot figure out how to filter or create a FileMetaData object
(since that is what the metadata_collector param of
 pyarrow.parquet.write_metadata expects) from either a set of
RowGroupMetaData or ColumnChunkMetaData objects. Is this possible? I'm
trying to avoid needing to reread the FileMetaData from each file in the
dataset directly.

Mime
View raw message