hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Takenori Sato <ts...@cloudian.com>
Subject Re: Re: Regarding HDFS and YARN support for S3
Date Mon, 29 Sep 2014 14:57:12 GMT
Hi Naga,

> But what i don't understand is why 2 interfaces (may be i am novice in
HDFS and hence not able to completely correlate with jira's which you
gave).

A client program is encouraged to use FileContext API instead of FileSystem
API. Here's why <http://www.slideshare.net/hadoopusergroup/file-context>.
And the whole discussion is at HADOOP-6223(New improved FileSystem
interface for those implementing new files systems.).

Thanks,
Takenori

On Mon, Sep 29, 2014 at 11:27 PM, Naganarasimha G R (Naga) <
garlanaganarasimha@huawei.com> wrote:

>  Hi Takenori,
> Thanks for replying but still seem not getting some concepts
> I understand that we need to give *"**fs.AbstractFileSystem.s3.impl" *if
> we want to submit job using "./yarn jar" with S3 HCFS configured*. *
> But what i don't understand is why 2 interfaces (may be i am novice in
> HDFS and hence not able to completely correlate with jira's which you
> gave).
> If you can brief the differences between FileSystem and
> AbstractFileSystem, It would be helpful.
>
>    Regards,
>
> Naga
>
>
>
> Huawei Technologies Co., Ltd.
> Phone:
> Fax:
> Mobile:  +91 9980040283
> Email: naganarasimhagr@huawei.com
> Huawei Technologies Co., Ltd.
> Bantian, Longgang District,Shenzhen 518129, P.R.China
> http://www.huawei.com
>
>
>  *From:* Takenori Sato [tsato@cloudian.com]
> *Sent:* Monday, September 29, 2014 07:29
> *To:* user@hadoop.apache.org
> *Subject:* Re: Re: Regarding HDFS and YARN support for S3
>
>   Hi,
>
>  You may want to check HADOOP-10400
> <https://issues.apache.org/jira/browse/HADOOP-10400> for the overhaul of
> S3 filesystem fixed in 2.6.
>
>  The subclass of AbstractFileSystem was filed as HADOOP-10643
> <https://issues.apache.org/jira/browse/HADOOP-10643>, but which was not
> included in HADOOP-10400 though I made a comment
> <https://issues.apache.org/jira/browse/HADOOP-10400?focusedCommentId=14104967&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14104967>
> .
>
>  I suggest not to use S3 as defaultFS as commented in "Why you cannot use
> S3 as a replacement for HDFS <https://wiki.apache.org/hadoop/AmazonS3>"
> to avoid all sorts of these issues.
>
>  The best practice is to use S3 as a supplementary solution to Hadoop in
> order to bring life cycle management(expiration and tiering), and
> source/destination over the internet.
>
>  Thanks,
> Takenori
>
>
> On Sun, Sep 28, 2014 at 5:23 PM, Naganarasimha G R (Naga) <
> garlanaganarasimha@huawei.com> wrote:
>
>>  Hi Jay,
>> Thanks a lot for replying and it clarifies most of it, but still some
>> parts are not so clear .
>> Some clarifications from my side :
>> *| When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.*
>> :) i think i am not doing this basic mistake . What we have done is we
>> have configured *"viewfs://nsX" for "fs.defaultFS"* and one of the mount
>> is S3 i.e. *"fs.viewfs.mounttable.nsX.link./uds" to "s3://hadoop/test1/"*
>> .
>> So it fails to even create YARNRunner instance as there is no mapping for
>> *"**fs.AbstractFileSystem.s3.impl" *if run "./yarn jar"*. *But as per
>> the code even if set *"fs.defaultFS"* to s3 it will not work as there is
>> no mapping for S3's impl of AbstractFileSystem interface.
>>
>>  These are my further queries
>>
>>    1. Whats the purpose of *AbstractFileSystem *and *FileSystem *
>>    interfaces?
>>    2. Does HDFS default package(code) support configuration of S3 ? I
>>    see S3 implementation of *FileSystem* interface(
>>    *org.apache.hadoop.fs.s3.S3FileSystem*) *but not for **AbstractFileSystem
>>    **!. *So i presume it doesn't support S3 completely. Whats the reason
>>    for not supporting both ?
>>    3. Suppose if i need to support Amazon S3 do i need to extend and
>>    implement *AbstractFileSystem *and configure  *"**fs.AbstractFileSystem.s3.impl"
>>    *or some thing more than this i need to take care*?*
>>
>>    Regards,
>>
>> Naga
>>
>>
>>
>> Huawei Technologies Co., Ltd.
>> Phone:
>> Fax:
>> Mobile:  +91 9980040283
>> Email: naganarasimhagr@huawei.com
>> Huawei Technologies Co., Ltd.
>> Bantian, Longgang District,Shenzhen 518129, P.R.China
>> http://www.huawei.com
>>
>>
>>    ------------------------------
>> *From:* jay vyas [jayunit100.apache@gmail.com]
>> *Sent:* Saturday, September 27, 2014 02:41
>> *To:* common-user@hadoop.apache.org
>> *Subject:* Re:
>>
>>      See https://wiki.apache.org/hadoop/HCFS/
>>
>> YES Yarn is written to the FileSystem interface.  It works on
>> S3FileSystem and GlusterFileSystem and any other HCFS.
>>
>>  We have run , and continue to run, the many tests in apache bigtop's
>> test suite against our hadoop clusters running on alternative file system
>> implementations,
>>  and it works.
>>
>>  When you say "HDFS does not support fs.AbstractFileSystem.s3.impl"....
>> That is true.  If your file system is configured using HDFS, then s3 urls
>> will not be used, ever.
>>
>>  When you create a FileSystem object in hadoop, it reads the uri (i.e.
>> "glusterfs:///") and then finds the file system binding in your
>> core-site.xml (i.e. fs.AbstractFileSystem.glusterfs.impl).
>>
>>  So the URI must have a corresponding entry in the core-site.xml.
>>
>>  As a reference implementation, you can see
>> https://github.com/gluster/glusterfs-hadoop/blob/master/conf/core-site.xml
>>
>>
>>
>>
>> On Fri, Sep 26, 2014 at 10:10 AM, Naganarasimha G R (Naga) <
>> garlanaganarasimha@huawei.com> wrote:
>>
>>>   Hi All,
>>>
>>>  I have following doubts on pluggable FileSystem and YARN
>>> 1. If all the implementations should extend FileSystem then why there is
>>> a parallel class AbstractFileSystem. which ViewFS extends ?
>>> 2. Is YARN supposed to run on any of the pluggable
>>> org.apache.hadoop.fs.FileSystem like s3 ?
>>> if its suppose to run then when submitting a job in the client side
>>>  YARNRunner is calling FileContext.getFileContext(this.conf);
>>> which is further calling FileContext.getAbstractFileSystem() which
>>> throws exception for S3.
>>> So i am not able to run YARN job with ViewFS with S3 as mount. And based
>>> on the code even if i configure only S3 then also its going to fail.
>>> 3. HDFS does not support "fs.AbstractFileSystem.s3.impl" with some
>>> default class similar to org.apache.hadoop.fs.s3.S3FileSystem ?
>>>
>>>    Regards,
>>>
>>> Naga
>>>
>>>
>>>
>>> Huawei Technologies Co., Ltd.
>>> Phone:
>>> Fax:
>>> Mobile:  +91 9980040283
>>> Email: naganarasimhagr@huawei.com
>>> Huawei Technologies Co., Ltd.
>>> http://www.huawei.com
>>>
>>>
>>>
>>
>>
>> --
>> jay vyas
>>
>
>

Mime
View raw message