hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14138) Remove S3A ref from META-INF service discovery, rely on existing core-default entry
Date Fri, 07 Apr 2017 01:43:41 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960141#comment-15960141

Siddharth Seth commented on HADOOP-14138:

[~steve_l] - I understand the mechanics behind *-default.xml and *-site.xml. When I said "If
someone wants to use s3a, I'd expect them to explicitly set it up in their Configuration,"
- their own Configuration could well be core-site.xml, which will then be loaded by all Hadoop

What I'm asking is why s3a gets special treatment, and an entry in core-default.xml.  Along
with that, the 5+ additional s3a settings - why do they need to be defined in core-default.xml?
Should be possible to have the default values in code. This could be a separate template,
which users can include, to get all relevant settings (if custom settings are required). Without
custom settings, the service loader approach is sufficient to get s3a functional, as long
as the jar is available.

Hdfs does not have an entry in core-default, and relies upon the ServiceLoader approach. (fs.hdfs.impl
does not exist. fs.AbstractFileSystem.hdfs.impl exists - I don't know what this is used for).

core-default.xml, to me at least, serves more as documentation of defaults. The files can
go out of sync with the default values defined in code, YarnConfiguration for example. It
takes additional effort to keep the files in sync. There's jiras to remove all the *-default.xml
files, in favor of code defaults (I don't expect these to be fixed soon since such changes
would be incompatible). For most parameters in these files, the code has default values (all
the IPC defaults).
I suspect nothing has broken so far, because the defaults exist in code.

In terms of the s3a and service loader problems, HADOOP-14132 sounds like a very good fix
to have. If I'm understanding this correctly, general FS operations will be faster if we don't
load all filesystems in the clsaspath. I'm worried that we're introducing a new dependency
on core-default by making this change, while I think we should be going in the opposite direction
and getting rid of dependencies on these files.

> Remove S3A ref from META-INF service discovery, rely on existing core-default entry
> -----------------------------------------------------------------------------------
>                 Key: HADOOP-14138
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14138
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.4, 3.0.0-alpha3
>         Attachments: HADOOP-14138.001.patch, HADOOP-14138-branch-2-001.patch
> As discussed in HADOOP-14132, the shaded AWS library is killing performance starting
all hadoop operations, due to classloading on FS service discovery.
> This is despite the fact that there is an entry for fs.s3a.impl in core-default.xml,
*we don't need service discovery here*
> Proposed:
> # cut the entry from {{/hadoop-aws/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem}}
> # when HADOOP-14132 is in, move to that, including declaring an XML file exclusively
for s3a entries
> I want this one in first as its a major performance regression, and one we coula actually
backport to 2.7.x, just to improve load time slightly there too

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message