reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1339) Adding IInputPartition.Cache() for data download and cache
Date Mon, 25 Apr 2016 21:22:13 GMT

    [ https://issues.apache.org/jira/browse/REEF-1339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257081#comment-15257081
] 

Julia commented on REEF-1339:
-----------------------------

There are two layers:
1. Data file Downloading from remote to local. Currently this is configurable. 
2. Data cache, - get IEnumerable, iterate it so that it is cached. 

Just want to clarify are we talking about second one only or both? Or do we want to mixture
those two together? 

For second one, what does cache the data to HDD mean? Iterating data Rows and save the data
to HDD? In what format? 

> Adding IInputPartition.Cache() for data download and cache
> ----------------------------------------------------------
>
>                 Key: REEF-1339
>                 URL: https://issues.apache.org/jira/browse/REEF-1339
>             Project: REEF
>          Issue Type: Task
>            Reporter: Julia
>            Assignee: Andrew Chung
>              Labels: FT
>
> Currently, in FileSystemInputPartition, data downloading is implemented in Initilaize()
and called from GetPartitionHandle. It doesn't give client a flexibility to decide when to
download data. Besides, if client wants to cache data in advance, they need to call GetPartitionHandle()
and iterate the data. 
> We would like to expose a new API Cache() in IInputPartition which performs data download
to RAM, SSD, HDD, etc based on client's configuration. 
> The method should be called in ContextStartHandler  in IMRU scenarios. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message