crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-505) Store intermediate data in memory only using Tachyon
Date Tue, 31 Mar 2015 13:58:53 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388554#comment-14388554
] 

Micah Whitacre commented on CRUNCH-505:
---------------------------------------

I haven't looked much into Tachyon but does it provide an implementation of Hadoop FileSystem
or mostly focuses on the Java File API?  I'm not seeing one from my cursory glance  I'm not
seeing one.  If it did provide one that'd be pretty easy to support.  The challenge here would
be supporting Tachyon while also not requiring it for all consumers if it had a different
API.

[1] - http://tachyon-project.org/Running-Hadoop-MapReduce-on-Tachyon.html

> Store intermediate data in memory only using Tachyon
> ----------------------------------------------------
>
>                 Key: CRUNCH-505
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-505
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Ioannis Kerkinos
>            Assignee: Josh Wills
>
> Tachyon is a memory-centric distributed storage system that enables reliable data sharing
at memory-speed. If used as the storage for intermediate data (between MR jobs) it should
improve performance as you won't have to go to HDFS. In order to do so, the MUST_CACHE write
type of Tachyon can be used. This will enable data to be persisted in memory only without
going to HDFS. So the intermediate data will be read/written at memory-speed and only the
final result will be written in HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message