incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon (JIRA)" <j...@apache.org>
Subject [jira] Created: (HAMA-133) To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
Date Thu, 11 Dec 2008 08:57:44 GMT
To reduce disk I/O operations, Remove 'reduce phase' from blocking_mapred
-------------------------------------------------------------------------

                 Key: HAMA-133
                 URL: https://issues.apache.org/jira/browse/HAMA-133
             Project: Hama
          Issue Type: Sub-task
          Components: implementation
    Affects Versions: 0.1.0
            Reporter: Edward J. Yoon
             Fix For: 0.1.0


> If we remove 'reduce phase', I guess we can reduce the disk I/O operations.


Yes.


>
>
> In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> Constants.COLUMN }, and write directly blocks.


Two methods to be considered:
1) We need a InputFormat that partitions the matrix table according to the
row boundaries of the blocks.
   This should be carefully to make sure a single block will not divied
into two or more mappers.

2) Like what RandomMatrixMap does, we just tell the mappers the row/column
boundaries of the blocks of a matrix-table.
   Scanner the portion of the table will be done in a mapper.

I think 1) may be better than 2).
An InputFormat can get the locality of a range of table to let MR know how
to move the mr computations close to it.
In 2), if we do it like RandomMatrixMap, we may lose some locality
informations of the table. so that the network transfer overhead may be
increase.

It is just my guess and thoughts.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message