hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5851) Support memory as a storage medium
Date Thu, 24 Apr 2014 20:03:29 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13980183#comment-13980183

Colin Patrick McCabe commented on HDFS-5851:

I took a quick look at the design doc.  I think the focus on "discardable" memory makes sense
in light of next-gen frameworks like Spark, Tez, etc.  One note: Tachyon, Spark's caching
layer, does not currently incorporate the concept of RDDs, although that support is planned,
as I understand.  It's just caching (serialized) files at this point, and I think the semantics
match up pretty well with what we're talking about here.  The execution framework can re-generate
the data if needed... this re-generating support does not need to be included in HDFS.

I think that some HDFS applications will want the ability to treat multiple files as a single
eviction unit... i.e., if you evict one file, you evict them all.  (Things like Hive tables
are multiple files, but probably ought to be treated as a single unit for caching purposes.)
 There are also some questions about when eviction can occur... it seems like it would be
very inconvenient to do it while the file was being read.  On the other hand, we probably
need a timeout to prevent a selfish process (or a process on a disconnected node) from pinning
something in the cache forever by keeping a file open.

Clearly we want the ability to do things like skip checksums when reading the cached files.
 This will reuse a lot of the HDFS-4949 code.  It's less clear what other aspects of the HDFS-4949
code we'll want to reuse.  I think cache pools might be one such thing.  There is a potential
to reuse some of the implementation as well, such as mlocking and so forth.  An mlocked file
in /dev/shm could be a good way to go here.

I am free all of next week, except for Friday.  Let's schedule a webex so we can figure this
stuff out.

> Support memory as a storage medium
> ----------------------------------
>                 Key: HDFS-5851
>                 URL: https://issues.apache.org/jira/browse/HDFS-5851
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 3.0.0
>            Reporter: Arpit Agarwal
>            Assignee: Arpit Agarwal
>         Attachments: SupportingMemoryStorageinHDFSPersistentandDiscardableMemory.pdf
> Memory can be used as a storage medium for smaller/transient files for fast write throughput.
> More information/design will be added later.

This message was sent by Atlassian JIRA

View raw message