The Buzz project is one example I know of that reads parquet files from S3 using the Rust implementation

The SerializedFileReader[1] from the Rust parquet crate, despite its somewhat misleading name, doesn't have to read from files, instead it reads from something that implements the ChunkReader [2] trait. I am not sure how well this matches what you are looking for. 

Hope that helps,


On Sat, Feb 13, 2021 at 10:17 AM Steve Kim <> wrote:
> Currently, only supports local disk files. Potentially, this can be done using the rusoto crate that provides a s3 client. What would be a good way to do this?
> 1. create a remote parquet reader (potentially duplicate lots of code)
> 2. create an interface to abstract away reading from local/remote files (not sure about performance if the reader blocks on every operation)

This is a great question.

I think that approach (2) is superior, although it requires more work
than approach (1) to design an interface that works well across
multiple file stores that have different performance characteristics.
To accommodate storage-specific performance optimizations, I expect
that the common interface will have to be more elaborate than the
current reader API.

Is it possible for the Rust reader to use the c++ implementation
If this reuse of implementation is feasible, then we could focus
efforts on improving the c++ implementation and get the benefits in
Python, Rust, etc.

In the Java ecosystem, the (non-Arrow, row-wise) Parquet reader uses
the Hadoop FileSystem abstraction. This abstraction is complex, leaky,
and not well specialized for read patterns that are typical for
Parquet files. We can learn from these mistakes to create a superior
reader interface in the Arrow/Parquet project.