arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joris Van den Bossche <jorisvandenboss...@gmail.com>
Subject Re: Questions | filesystem legacy
Date Tue, 04 Aug 2020 12:17:59 GMT
Hi Akbar,

The documentation regarding the legacy and new file system interface is
indeed somewhat lacking. So in general, we have the older, and now legacy,
filesystems in pyarrow.filesystem (
https://arrow.apache.org/docs/python/filesystems_deprecated.html) and a new
implementation in pyarrow.fs (
https://arrow.apache.org/docs/python/filesystems.html,
https://arrow.apache.org/docs/python/api/filesystems.html).
We need to document this better (and actually deprecate), but the long term
goal is certainly to eventually remove pyarrow.filesystem.

So regarding your specific HDFS related questions:

- There is also a HadoopFileSystem in the new interface (
https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html),
so in general support for HDFS is not limited to the deprecated API
- The available methods on the new interface are different though, and
there is no "download" method anymore. However (although I am not fully
familiar with this), I think you can achieve more or less the same with the
"open_input_file" method of the new interface (which returns a NativeFile
object, which has a download method).

Best,
Joris

On Tue, 4 Aug 2020 at 04:19, Akbar <ed.akbar@gmail.com> wrote:

> apologies - I made of mess of that email. Let me try again
>
>
>    - *Question 1* - pyarrow.HadoopFileSystem.download
>    <https://arrow.apache.org/docs/python/generated/pyarrow.HadoopFileSystem.download.html>
-
>    is listed under Filesystem Interface (Legacy) (and so are all the HDFS
>    APIs) - does this mean support for this is limited?
>    - *Question 2* - is there an equivalent to
>    pyarrow.HadoopFileSystem.download in the newer
>    pyarrow.fs.HadoopFileSystem
>    <https://arrow.apache.org/docs/python/generated/pyarrow.fs.HadoopFileSystem.html#pyarrow.fs.HadoopFileSystem>
>    .
>
>
> I want to be able to fetch/download an HDFS entire file or folder to the
> HOST OS filesystem - Let me know if you have any guidance
>
>
> On Fri, 31 Jul 2020 at 23:02, Akbar <ed.akbar@gmail.com> wrote:
>
>>
>> Hello,
>>
>> Documentation on legacy file system interface is not quite clear. I’m not
>> sure if the HDFS API layers are still relevant and supported. I know the
>> HDFS API are operational
>>
>> The two questions I have
>> 1. if there is an equivalent to HadoopFileSystem.download in the new file
>> system interface (pyarrow.fs.HadoopFileSystem)
>> 2. Will support for HDFS API be removed - I ask this based on the legacy
>> tag on the documentation
>>
>> Sent from my iPhone
>
>

Mime
View raw message