arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Lamb <al...@influxdata.com>
Subject Re: [RUST] Reading parquet
Date Sun, 24 Jan 2021 12:01:20 GMT
Hi Fernando,

Keeping the data in memory as `RecordBatch`es sounds like the way to go if
you want it all to be in memory.

Another way to work in Rust with data from parquet files is to use the
`DataFusion` library; Depending on your needs it might save you some time
building up your analytics (e.g. it has aggregations, filtering and sorting
built it).

Here are some examples of how to use DataFusion with a parquet file (with
the dataframe and the SQL api):
https://github.com/apache/arrow/blob/master/rust/datafusion/examples/dataframe.rs
https://github.com/apache/arrow/blob/master/rust/datafusion/examples/parquet_sql.rs

If you already have RecordBatches you can register an in memory table as
well.

Hope that helps,
Andrew


On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera <
fernando.j.herrera@gmail.com> wrote:

> Hi all,
>
> A quick question regarding reading a parquet file. What is the best way to
> read a parquet file and keep it in memory to do data analysis?
>
> What I'm doing now is using the record reader from the
> ParquetFileArrowReader and then I read all the record batches from the
> file. I keep the batches in memory in a vector of record batches. This way
> I have access to them to do some aggregations I need from the file.
>
> Is there another way to do this?
>
> Thanks,
> Fernando
>

Mime
View raw message