Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A9672200CF7 for ; Tue, 5 Sep 2017 01:38:16 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A7E801609CE; Mon, 4 Sep 2017 23:38:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EEC031609C6 for ; Tue, 5 Sep 2017 01:38:15 +0200 (CEST) Received: (qmail 41519 invoked by uid 500); 4 Sep 2017 23:38:14 -0000 Mailing-List: contact commits-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@arrow.apache.org Delivered-To: mailing list commits@arrow.apache.org Received: (qmail 41504 invoked by uid 99); 4 Sep 2017 23:38:14 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Sep 2017 23:38:14 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 954C6F32F1; Mon, 4 Sep 2017 23:38:13 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: wesm@apache.org To: commits@arrow.apache.org Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: arrow git commit: ARROW-1417: [Python] Allow more generic filesystem objects to be passed to ParquetDataset Date: Mon, 4 Sep 2017 23:38:13 +0000 (UTC) archived-at: Mon, 04 Sep 2017 23:38:16 -0000 Repository: arrow Updated Branches: refs/heads/master b1e56a2f5 -> ec32013fd ARROW-1417: [Python] Allow more generic filesystem objects to be passed to ParquetDataset This way, the `ParquetDataset` accepts both `S3FileSystem` and `LocalFileSystem` objects as they are used in `dask`. By using `issubclass`, external libraries may write their own FS wrappers by inheriting from the arrow FS. I tested the integration with dask and this will fix the issue blocking https://github.com/dask/dask/pull/2527 Author: fjetter Closes #1032 from fjetter/ARROW-1417 and squashes the following commits: 75f18a5 [fjetter] Remove isinstance check in _ensure_filesystem 302b644 [fjetter] Perform check for type object before issubclass ed111c9 [fjetter] Allow more generic filesystems to be passed Project: http://git-wip-us.apache.org/repos/asf/arrow/repo Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/ec32013f Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/ec32013f Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/ec32013f Branch: refs/heads/master Commit: ec32013fd6df35b051173f0e9aa8aa8833f1c819 Parents: b1e56a2 Author: fjetter Authored: Mon Sep 4 19:38:07 2017 -0400 Committer: Wes McKinney Committed: Mon Sep 4 19:38:07 2017 -0400 ---------------------------------------------------------------------- python/pyarrow/parquet.py | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/arrow/blob/ec32013f/python/pyarrow/parquet.py ---------------------------------------------------------------------- diff --git a/python/pyarrow/parquet.py b/python/pyarrow/parquet.py index 4bc56eb..5dabca9 100644 --- a/python/pyarrow/parquet.py +++ b/python/pyarrow/parquet.py @@ -16,13 +16,14 @@ # under the License. import os +import inspect import json import six import numpy as np -from pyarrow.filesystem import FileSystem, LocalFileSystem +from pyarrow.filesystem import FileSystem, LocalFileSystem, S3FSWrapper from pyarrow._parquet import (ParquetReader, FileMetaData, # noqa RowGroupMetaData, ParquetSchema, ParquetWriter) @@ -645,13 +646,18 @@ class ParquetDataset(object): def _ensure_filesystem(fs): - if not isinstance(fs, FileSystem): - if type(fs).__name__ == 'S3FileSystem': - from pyarrow.filesystem import S3FSWrapper - return S3FSWrapper(fs) - else: - raise IOError('Unrecognized filesystem: {0}' - .format(type(fs))) + fs_type = type(fs) + + # If the arrow filesystem was subclassed, assume it supports the full interface and return it + if not issubclass(fs_type, FileSystem): + for mro in inspect.getmro(fs_type): + if mro.__name__ is 'S3FileSystem': + return S3FSWrapper(fs) + # In case its a simple LocalFileSystem (e.g. dask) use native arrow FS + elif mro.__name__ is 'LocalFileSystem': + return LocalFileSystem.get_instance() + + raise IOError('Unrecognized filesystem: {0}'.format(fs_type)) else: return fs