arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Moore <>
Subject Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3
Date Thu, 22 Jun 2017 04:54:20 GMT
Has anyone started looking into how to read data sets from S3? I started
looking into it and wondered if anyone has a design in mind.

We could implement an S3FileSystem class in pyarrow/ The
filesystem components could probably be written against the AWS Python SDK.

The HDFS file system and file classes, however, are implemented at least
partially in Cython & C++. Is there an advantage to doing that for S3 too?



Kevin Moore
CEO, Quilt Data, Inc. | LinkedIn <>
(415) 497-7895

Data packages for fast, reproducible data science

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message