reef-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julia (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (REEF-1143) Adding API to allow deserialize data from remote files directly
Date Wed, 20 Jan 2016 22:15:40 GMT

     [ https://issues.apache.org/jira/browse/REEF-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Julia reassigned REEF-1143:
---------------------------

    Assignee: Julia

> Adding API to allow deserialize data from remote files directly
> ---------------------------------------------------------------
>
>                 Key: REEF-1143
>                 URL: https://issues.apache.org/jira/browse/REEF-1143
>             Project: REEF
>          Issue Type: New Feature
>          Components: REEF-IO
>            Reporter: Julia
>            Assignee: Julia
>
> Currently, Deserialize(string fileFolder) in IFileDeSerializer is used to deserialize
localfiles in a given local file folder. For a set of remote files,  FileSystemInputPartition
first download remote files to a local folder, then pass the folder to Deserialize(string
fileFolder) method. 
> For remote files, especially when file size is huge, we would need to read file data
chuck by chuck and consume the data instead of downloading the entire file at once. As the
remote file paths provided are in a set and the folder of the remote files are controlled
at caller side and it may contain some other files, so we cannot just simply use the folder
name, but remote file names instead. Therefor the new API for remote file deserialize would
be 
>  T Deserialize(ISet<string> filePaths);
>  
> This would end up two methods in IFileDeSerializer<T>:
>  T Deserialize(string fileFolder);  -- for local file
>  T Deserialize(ISet<string> filePaths); -- for remote file
> It is clean. The only issue is the method name don't explain themselves for the usage.
Another option is to make method name explicit:
> T DeserializeLocalFIles(string fileFolder);  -- for local file
> T DeserializeRemoteFIles(ISet<string> filePaths); -- for remote file
> For second option, original Deserialize() API will be renamed, it is a breaking change,
although I don't think anyone else is using it. 
> Please comments. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message