incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HAMA-258) Design a input and output system
Date Wed, 13 Apr 2011 06:22:05 GMT

    [ https://issues.apache.org/jira/browse/HAMA-258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019224#comment-13019224
] 

Thomas Jungblut commented on HAMA-258:
--------------------------------------

What is the actual difference between these two functions? 
IMHO the compute() is called for each vertex, whereas we are using a bsp() for a task.

We should provide a Reader class that reads SequenceFiles/TextFiles and HBase tables.

How should we do the partitioning?
* Block partitioning like Hadoop -  this is not very flexible, depends on locality of the
data
* Key partitioning (like in my blog, you can just send the message to the groom that contains
this vertexID) // this would be better for HBase input, or for SequenceFiles.

Or like the last two, just with messaging, but this would be slower than writing it into a
HDFS block.

How about a simple outputsystem?
We can provide an output collector in the BSPPeer and each peer has it's own outputfile in
HDFS. 
If a user wants to output, he simply can and don't have to code a SequenceFile writer.

> Design a input and output system
> --------------------------------
>
>                 Key: HAMA-258
>                 URL: https://issues.apache.org/jira/browse/HAMA-258
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.3.0
>
>
> This issue will handle the input and output system with data splitter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message