reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Chung (JIRA)" <>
Subject [jira] [Commented] (REEF-1339) Adding IInputPartition.Cache() for data download and cache
Date Fri, 22 Apr 2016 21:25:12 GMT


Andrew Chung commented on REEF-1339:

Agreed about explicit paths, but if a collection of File System paths are passed in, we don't
need to care about (and may not even be able to find out) whether the underlying implementation
is HDD or SSD. This also doesn't explain the need for "at most" or "at least" in terms of
storage efficiency, since there is no constraint.

[~dkm2110] would you happen to know what part of the Spark codebase to look in?

> Adding IInputPartition.Cache() for data download and cache
> ----------------------------------------------------------
>                 Key: REEF-1339
>                 URL:
>             Project: REEF
>          Issue Type: Task
>            Reporter: Julia
>            Assignee: Andrew Chung
>              Labels: FT
> Currently, in FileSystemInputPartition, data downloading is implemented in Initilaize()
and called from GetPartitionHandle. It doesn't give client a flexibility to decide when to
download data. Besides, if client wants to cache data in advance, they need to call GetPartitionHandle()
and iterate the data. 
> We would like to expose a new API Cache() in IInputPartition which performs data download
to RAM, SSD, HDD, etc based on client's configuration. 
> The method should be called in ContextStartHandler  in IMRU scenarios. 

This message was sent by Atlassian JIRA

View raw message