crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-505) Store intermediate data in memory only using Tachyon
Date Wed, 01 Apr 2015 02:14:53 GMT


Micah Whitacre commented on CRUNCH-505:

>>Would it be ok if I were to start working on it?

Totally feel free.  

>> If so, do you maybe have some tips on where to start? 

The first step to me would be to validate that if someone did use tachyon as the default FS
would everything in Crunch work.  So not just for the intermediate state but for that plus
persistence at the beginning or end.  Then make sure we can tweak the Tachyon write type for
targets to make sure it goes to HDFS.

Like I said the goal would be for Tachyon to be optional and not a required part of Crunch.
 I haven't dug in so not sure how much that will help/hinder your original vision for this

> Store intermediate data in memory only using Tachyon
> ----------------------------------------------------
>                 Key: CRUNCH-505
>                 URL:
>             Project: Crunch
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.12.0
>            Reporter: Ioannis Kerkinos
>            Assignee: Josh Wills
> Tachyon is a memory-centric distributed storage system that enables reliable data sharing
at memory-speed. If used as the storage for intermediate data (between MR jobs) it should
improve performance as you won't have to go to HDFS. In order to do so, the MUST_CACHE write
type of Tachyon can be used. This will enable data to be persisted in memory only without
going to HDFS. So the intermediate data will be read/written at memory-speed and only the
final result will be written in HDFS.

This message was sent by Atlassian JIRA

View raw message