hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-940) Add StreamInputFormat
Date Mon, 02 May 2016 00:02:12 GMT

    [ https://issues.apache.org/jira/browse/HAMA-940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266012#comment-15266012

Edward J. Yoon commented on HAMA-940:

As I mentioned in Description, we can simply check whether there's an newly appended records
to the input file, keeping last read offset. 

To implement this, first of all, you should see the InputFormat interface class. The tricky
issue is how we implement the getSplits() method and multiple tasks. 

At the moment, my simple idea is that one bsp task acts as a "Stream input queue" without
implement StreamInputFormat and change the framework core. For example, we set the file path
in job configuration. The master task acts like below: 

if(isMaster(peer.me)) {
  while(true) {
     peer.reopen(); // reopen
     peer.skip(offset); // jump to last offset
     if(peer.readNext()) {
         // at here we do load-balance.
        sendTo("send a newly appended record to free slave tasks");
     } else {

> Add StreamInputFormat
> ---------------------
>                 Key: HAMA-940
>                 URL: https://issues.apache.org/jira/browse/HAMA-940
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp core
>            Reporter: Edward J. Yoon
> Add StreamInputFormat that reads newly appended records from previous superstep. 
> I roughly guess it will be possible using reopen() method and file offset.

This message was sent by Atlassian JIRA

View raw message