arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Moore <ke...@quiltdata.io>
Subject Implementing (ARROW-1119) [Python] Enable reading Parquet data sets from Amazon S3
Date Thu, 22 Jun 2017 04:54:20 GMT
Has anyone started looking into how to read data sets from S3? I started
looking into it and wondered if anyone has a design in mind.

We could implement an S3FileSystem class in pyarrow/filesystem.py. The
filesystem components could probably be written against the AWS Python SDK.

The HDFS file system and file classes, however, are implemented at least
partially in Cython & C++. Is there an advantage to doing that for S3 too?

Thanks,

Kevin

----
Kevin Moore
CEO, Quilt Data, Inc.
kevin@quiltdata.io | LinkedIn <https://www.linkedin.com/in/kevinemoore/>
(415) 497-7895


Data packages for fast, reproducible data science
quiltdata.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message