After writing a _metadata file as done here https://arrow.apache.org/docs/python/parquet.html?highlight=write_metadata#writing-metadata-and-common-medata-files, I'm wondering if it is possible to read that _metadata file (e.g. using pyarrow.parquet.read_metadata), filter out some paths, and write it back to disk. I can see that file path info is available, e.g.
meta = pq.read_metadata(...)
But I cannot figure out how to filter or create a FileMetaData object (since that is what the metadata_collector param of pyarrow.parquet.write_metadata expects) from either a set of RowGroupMetaData or ColumnChunkMetaData objects. Is this possible? I'm trying to avoid needing to reread the FileMetaData from each file in the dataset directly.