hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8401) Memfs - a layered file system for in-memory storage in HDFS
Date Thu, 28 May 2015 18:17:21 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563399#comment-14563399

Colin Patrick McCabe commented on HDFS-8401:

bq. Allow using memory features without calling HDFS-specific APIs. This also isolates applications
from evolving APIs. Applications currently use shims and reflection tricks to work with different
versions of HDFS.

HDFS-4949 didn't require applications to call any HDFS-specific APIs.  The administrator simply
set a list of files and directories to be cached.  When applications read those files or directories,
they were retrieved from the cache.

We could do something similar here by specifying that we wanted opportunistic caching on a
certain directory subtree.  For example we could set a 2Q eviction policy on a certain directory
subtree and have the NameNode manage that.  [~andrew.wang] and I discussed doing that for
HDFS-4949, but we simply didn't have time.

bq. Once applications start using memfs someone could write a memfs layer over another HCFS
e.g. Amazon S3.

That does raise the question of why this belongs in HDFS, though.  If we just want a generic
FS caching layer in Hadoop, we could do that in hadoop-common.

> Memfs - a layered file system for in-memory storage in HDFS
> -----------------------------------------------------------
>                 Key: HDFS-8401
>                 URL: https://issues.apache.org/jira/browse/HDFS-8401
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
> We propose creating a layered filesystem that can provide in-memory storage using existing
features within HDFS. memfs will use lazy persist writes introduced by HDFS-6581. For reads,
memfs can use the Centralized Cache Management feature introduced in HDFS-4949 to load hot
data to memory.
> Paths in memfs and hdfs will correspond 1:1 so memfs will require no additional metadata
and it can be implemented entirely as a client-side library.
> The advantage of a layered file system is that it requires little or no changes to existing
applications. e.g. Applications can use something like {{memfs://}} instead of {{hdfs://}}
for files targeted to memory storage. 

This message was sent by Atlassian JIRA

View raw message