hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13321) Deprecate FileSystem APIs that promote inefficient call patterns.
Date Fri, 19 Aug 2016 19:11:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15428670#comment-15428670

Chris Nauroth commented on HADOOP-13321:

[~stevel@apache.org], are you suggesting that we don't in fact deprecate the APIs?  I'd be
comfortable with that if we mitigate in other ways, such as clear warnings in JavaDocs about
the potential inefficiencies.

bq. If the FS client does some very short term caching, even a fraction of a second, the penalty
of two back-to-back getFileStatus() checks would become zero ... that may be the way to go.

Linking to HADOOP-12876, which tracks an implementation of this idea in Azure Data Lake. 
I'd like to explore refactoring that out to Hadoop Common for any file system to use.

> Deprecate FileSystem APIs that promote inefficient call patterns.
> -----------------------------------------------------------------
>                 Key: HADOOP-13321
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13321
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs
>            Reporter: Chris Nauroth
>            Assignee: Mingliang Liu
>         Attachments: HADOOP-13321.000.patch
> {{FileSystem}} contains several methods that act as convenience wrappers over calling
{{getFileStatus}} and retrieving a single property of the returned {{FileStatus}}.  These
methods have a habit of fostering inefficient call patterns in applications, resulting in
multiple redundant {{getFileStatus}} calls.  For HDFS, this translates into wasteful NameNode
RPC traffic.  For file systems backed by cloud object stores, this translates into wasteful
HTTP traffic.  This issue proposes to deprecate these methods and instead encourage applications
to call {{getFileStatus}} and then reuse the same {{FileStatus}} instance as needed.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message