hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7694) FSDataInputStream should support "unbuffer"
Date Fri, 06 Feb 2015 03:54:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14308568#comment-14308568
] 

Colin Patrick McCabe commented on HDFS-7694:
--------------------------------------------

bq. One question, in what cases, user needs to unbuffer instead of closing the stream?

Good question.  The main answer is that re-opening a stream will cause a getBlockLocations
RPC to the NameNode.  Some applications cache a lot of open streams in order to avoid generating
a lot of NameNode traffic.  HBase is one, Impala is another.  This change is a really easy
way to let those applications save memory without generating a lot of RPC load on the NN.

> FSDataInputStream should support "unbuffer"
> -------------------------------------------
>
>                 Key: HDFS-7694
>                 URL: https://issues.apache.org/jira/browse/HDFS-7694
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.7.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7694.001.patch
>
>
> For applications that have many open HDFS (or other Hadoop filesystem) files, it would
be useful to have an API to clear readahead buffers and sockets.  This could be added to the
existing APIs as an optional interface, in much the same way as we added setReadahead / setDropBehind
/ etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message