hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-4949) Centralized cache management in HDFS
Date Tue, 06 Aug 2013 19:25:56 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13731143#comment-13731143
] 

Suresh Srinivas edited comment on HDFS-4949 at 8/6/13 7:24 PM:
---------------------------------------------------------------

My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows specific changes
# mlock there is no windows equivalent?

Quota for datanode cache is counted against pool

Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps pools should
have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no new cached replicas
will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the application even if
higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly created files
(on completion of write) will be added to the cache. This also has quota implications and
need for handling failures related to either reaching quota or non-availability of resources
for such automatic caching to work.

We should add TTL for caching request and expire the cache.

Lets refresh the design document based on discussions from the meeting.

                
      was (Author: sureshms):
    My notes from the meeting:
Enabling this feature on windows platform requires the following:
# Need Unix Domain sockets equivalent
# mmap and munmap is done using java and should not require any windows specific changes
# mlock there is no windows equivalent?

Quota for datanode cache is counted against pool

Design needs to cover the following scenarios in more detail:
# Two pools caching the same file and how is quota counted
# Resource failures and how it affects existing caches for the pools. Perhaps pools should
have priorities.
#* scenario 1 - resource failure takes down cached data. In the first cut, no new cached replicas
will be created.
#* scenario 2 - resources failed and cluster capacity is low, then the application even if
higher priority will not get cache quota.
# Caching supported for whole file for now.
# Only completed blocks will be cached. This is true for files that are being written.
# symlink paths will not be cached
# Need to add more details on enabling cache for a directory and how the newly created files
(on completion of write) will be added to the cache. This also has quota implications and
need for handling failures related to either reaching quota or non-availability of resources
for such automatic caching to work.

We should add TTL for caching request and expire the cache.

I think we should refresh the design document based on discussions from the discussions.

                  
> Centralized cache management in HDFS
> ------------------------------------
>
>                 Key: HDFS-4949
>                 URL: https://issues.apache.org/jira/browse/HDFS-4949
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Andrew Wang
>            Assignee: Andrew Wang
>         Attachments: caching-design-doc-2013-07-02.pdf
>
>
> HDFS currently has no support for managing or exposing in-memory caches at datanodes.
This makes it harder for higher level application frameworks like Hive, Pig, and Impala to
effectively use cluster memory, because they cannot explicitly cache important datasets or
place their tasks for memory locality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message