Here is some example code that shows a similar result.
from itertools import chain
from typing import Tuple, Any
def iter_parquet(parquet_file, columns = None, batch_size=1_000) -> Tuple[Any]:
record_batches = parquet_file.iter_batches(batch_size=batch_size, columns=columns)
# convert from columnar format of pyarrow arrays to a row format of python objects (yields tuples)
yield from chain.from_iterable(zip(*map(lambda col: col.to_pylist(), batch.columns)) for batch in record_batches)
I realize arrow is a columnar format, but I wonder if having the buffered row reading as a lazy iterator is a common enough use case with parquet + object storage being so common as a database alternative.