reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Weimer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (REEF-1143) Adding API to allow deserialize data from remote files directly
Date Wed, 10 Feb 2016 01:31:18 GMT

    [ https://issues.apache.org/jira/browse/REEF-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15140170#comment-15140170
] 

Markus Weimer commented on REEF-1143:
-------------------------------------

[this|http://stackoverflow.com/questions/3879152/how-do-i-concatenate-two-system-io-stream-instances-into-one]
illustrates what I mean.

> Adding API to allow deserialize data from remote files directly
> ---------------------------------------------------------------
>
>                 Key: REEF-1143
>                 URL: https://issues.apache.org/jira/browse/REEF-1143
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF-IO
>            Reporter: Julia
>            Assignee: Julia
>
> Currently, Deserialize(string fileFolder) in IFileDeSerializer is used to deserialize
localfiles in a given local file folder. For a set of remote files,  FileSystemInputPartition
first downloads remote files to a local folder, then passes the folder to Deserialize(string
fileFolder) method. 
> For remote files, especially when file size is huge, we would need to read file data
chuck by chuck and consume the data instead of downloading the entire file at once. As the
remote file paths provided are in a set and the folder of the remote files are controlled
at caller side and it may contain some other files, so we cannot just simply use the folder
name, but individual remote file names instead. Therefor the new API for remote file deserialize
would be 
>  T Deserialize(ISet<string> filePaths);
>  
> This would end up two methods in IFileDeSerializer<T>:
>  T Deserialize(string fileFolder);  -- for local file
>  T Deserialize(ISet<string> filePaths); 
> In fact, implementation of the interface is up to the one who implement it. For the second
API, we can use the FileSyetm injected to the Deserializer to determine if it is to access
local files or remote files. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message