hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4949) Centralized cache management in HDFS
Date Tue, 16 Jul 2013 19:10:53 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13710097#comment-13710097
] 

Colin Patrick McCabe commented on HDFS-4949:
--------------------------------------------

For most of the applications we're considering here, compression would not be a win, because
it is CPU-intensive.  It also would involve copying the data in memory, which is one of the
things we're trying to avoid here.  I think it will be more effective to use something like
CompressionCodec, ORC, Parquet, rcfile, etc.
                
> Centralized cache management in HDFS
> ------------------------------------
>
>                 Key: HDFS-4949
>                 URL: https://issues.apache.org/jira/browse/HDFS-4949
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: 3.0.0, 2.2.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: caching-design-doc-2013-07-02.pdf
>
>
> HDFS currently has no support for managing or exposing in-memory caches at datanodes.
This makes it harder for higher level application frameworks like Hive, Pig, and Impala to
effectively use cluster memory, because they cannot explicitly cache important datasets or
place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message