arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From filippo medri <>
Subject Reading large csv file with pyarrow
Date Fri, 14 Feb 2020 21:16:08 GMT
by experimenting with arrow read_csv function to convert csv fie into
parquet I found that it reads the data in memory.
On a side the ReadOptions class allows to specify a blocksize parameter to
limit how much bytes to process at a time, but by looking at the memory
usage my understanding is that the underlying Table is filled with all data.
Is there a way to at least specify a parameter to limit the read to a batch
of rows? I see that I can skip rows from the beginning, but I am not
finding a way to limit how many rows to read.
Which is the intended way to read a csv file that does not fit into memory?
Thanks in advance,
Filippo Medri

View raw message