incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Guo" <guosi...@gmail.com>
Subject Re: blocking_mapred() speed
Date Thu, 11 Dec 2008 08:50:45 GMT
Oh, The code is not neccessary. JVM will set the initial value of a new
array. sorry.

On Thu, Dec 11, 2008 at 4:37 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:

> Oh, one question.
>
> Why we need to fill C with zeros?
>
>  public SubMatrix mult(SubMatrix b) {
>    double[][] C = new double[this.getRows()][b.getColumns()];
>    for (int i = 0; i < this.getRows(); i++) {
>      Arrays.fill(C[i], 0);
>    }
>
>    for (int i = 0; i < this.getRows(); i++) {
>      for (int j = 0; j < b.getColumns(); j++) {
>        for (int k = 0; k < this.getColumns(); k++) {
>          C[i][j] += this.get(i, k) * b.get(k, j);
>        }
>      }
>    }
>
>    return new SubMatrix(C);
>   }
>
>
> On Thu, Dec 11, 2008 at 5:30 PM, Samuel Guo <guosijie@gmail.com> wrote:
> > On Thu, Dec 11, 2008 at 2:36 PM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
> >
> >> If we remove 'reduce phase', I guess we can reduce the disk I/O
> operations.
> >
> >
> > Yes.
> >
> >
> >>
> >>
> >> In the map, read { Constants.BLOCK_STARTROW, Constants.BLOCK_ENDROW,
> >> Constants.BLOCK_STARTCOLUMN, Constants.BLOCK_ENDCOLUMN } instead of {
> >> Constants.COLUMN }, and write directly blocks.
> >
> >
> > Two methods to be considered:
> > 1) We need a InputFormat that partitions the matrix table according to
> the
> > row boundaries of the blocks.
> >    This should be carefully to make sure a single block will not divied
> > into two or more mappers.
> >
> > 2) Like what RandomMatrixMap does, we just tell the mappers the
> row/column
> > boundaries of the blocks of a matrix-table.
> >    Scanner the portion of the table will be done in a mapper.
> >
> > I think 1) may be better than 2).
> > An InputFormat can get the locality of a range of table to let MR know
> how
> > to move the mr computations close to it.
> > In 2), if we do it like RandomMatrixMap, we may lose some locality
> > informations of the table. so that the network transfer overhead may be
> > increase.
> >
> > It is just my guess and thoughts.
> >
> >
> >>
> >>
> >> What do you think?
> >>
> >> --
> >> Best Regards, Edward J. Yoon @ NHN, corp.
> >> edwardyoon@apache.org
> >> http://blog.udanax.org
> >>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon @ NHN, corp.
> edwardyoon@apache.org
> http://blog.udanax.org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message