arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fernando Herrera <>
Subject Re: [RUST] Reading parquet
Date Sun, 24 Jan 2021 12:40:56 GMT
Thanks Andrew,

I did read the examples that you mentioned and I don't think they will help
me with what I want to do. I need to create two hash maps from the parquet
file to do further comparisons on those maps. In both cases I need to
create a set of unique ngrams from strings stored in the parquet file.

By the way, would it make sense to create a struct Table similar to the one
in pyarrow to collect several Record Batches?

Also, how is an object that implements Array <dyn Array> downcasted to
other types of Arrays. I'm doing it now using as_any and then down ref to
the type I want. But I have to write the type in the code and I want to
find a way for it to be done automatically.


On Sun, 24 Jan 2021, 12:01 Andrew Lamb, <> wrote:

> Hi Fernando,
> Keeping the data in memory as `RecordBatch`es sounds like the way to go if
> you want it all to be in memory.
> Another way to work in Rust with data from parquet files is to use the
> `DataFusion` library; Depending on your needs it might save you some time
> building up your analytics (e.g. it has aggregations, filtering and sorting
> built it).
> Here are some examples of how to use DataFusion with a parquet file (with
> the dataframe and the SQL api):
> If you already have RecordBatches you can register an in memory table as
> well.
> Hope that helps,
> Andrew
> On Sat, Jan 23, 2021 at 7:33 AM Fernando Herrera <
>> wrote:
>> Hi all,
>> A quick question regarding reading a parquet file. What is the best way
>> to read a parquet file and keep it in memory to do data analysis?
>> What I'm doing now is using the record reader from the
>> ParquetFileArrowReader and then I read all the record batches from the
>> file. I keep the batches in memory in a vector of record batches. This way
>> I have access to them to do some aggregations I need from the file.
>> Is there another way to do this?
>> Thanks,
>> Fernando

View raw message