incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Jungblut (JIRA)" <>
Subject [jira] [Commented] (HAMA-258) Design a input and output system
Date Wed, 13 Apr 2011 06:22:05 GMT


Thomas Jungblut commented on HAMA-258:

What is the actual difference between these two functions? 
IMHO the compute() is called for each vertex, whereas we are using a bsp() for a task.

We should provide a Reader class that reads SequenceFiles/TextFiles and HBase tables.

How should we do the partitioning?
* Block partitioning like Hadoop -  this is not very flexible, depends on locality of the
* Key partitioning (like in my blog, you can just send the message to the groom that contains
this vertexID) // this would be better for HBase input, or for SequenceFiles.

Or like the last two, just with messaging, but this would be slower than writing it into a
HDFS block.

How about a simple outputsystem?
We can provide an output collector in the BSPPeer and each peer has it's own outputfile in
If a user wants to output, he simply can and don't have to code a SequenceFile writer.

> Design a input and output system
> --------------------------------
>                 Key: HAMA-258
>                 URL:
>             Project: Hama
>          Issue Type: New Feature
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: Edward J. Yoon
>             Fix For: 0.3.0
> This issue will handle the input and output system with data splitter.

This message is automatically generated by JIRA.
For more information on JIRA, see:

View raw message