From corecommitsreturn4414apmailhadoopcorecommitsarchive=hadoop.apache.org@hadoop.apache.org Thu Mar 27 01:31:15 2008
ReturnPath:
DeliveredTo: apmailhadoopcorecommitsarchive@www.apache.org
Received: (qmail 84156 invoked from network); 27 Mar 2008 01:31:15 0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2)
by minotaur.apache.org with SMTP; 27 Mar 2008 01:31:15 0000
Received: (qmail 28556 invoked by uid 500); 27 Mar 2008 01:31:14 0000
DeliveredTo: apmailhadoopcorecommitsarchive@hadoop.apache.org
Received: (qmail 28516 invoked by uid 500); 27 Mar 2008 01:31:14 0000
MailingList: contact corecommitshelp@hadoop.apache.org; run by ezmlm
Precedence: bulk
ListHelp:
ListUnsubscribe:
ListPost:
ListId:
ReplyTo: coredev@hadoop.apache.org
DeliveredTo: mailing list corecommits@hadoop.apache.org
Received: (qmail 28507 invoked by uid 99); 27 Mar 2008 01:31:14 0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Mar 2008 18:31:14 0700
XASFSpamStatus: No, hits=2000.0 required=10.0
tests=ALL_TRUSTED
XSpamCheckBy: apache.org
Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130)
by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Mar 2008 01:30:43 +0000
Received: from eos.apache.org (localhost [127.0.0.1])
by eos.apache.org (Postfix) with ESMTP id C7D31D2DD
for ; Thu, 27 Mar 2008 01:30:54 +0000 (GMT)
ContentType: text/plain; charset="usascii"
MIMEVersion: 1.0
ContentTransferEncoding: 8bit
From: Apache Wiki
To: corecommits@hadoop.apache.org
Date: Thu, 27 Mar 2008 01:30:54 0000
MessageID: <20080327013054.14640.22883@eos.apache.org>
Subject: [Hadoop Wiki] Trivial Update of "Hama" by udanax
XVirusChecked: Checked by ClamAV on apache.org
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/hadoop/Hama

[http://wiki.apache.org/hadoopdata/attachments/Hama/attachments/hamamedium.png]

 * I'm looking for champion/mentor who can leads the proposal process.
 * http://wiki.apache.org/incubator/HamaProposal
== Introduction ==
'''Hama''' is a parallel matrix computational package based on Hadoop Map/Reduce. ''(Hama is in korean, which means 'Hippo').'' It will be useful for a massively largescale ''Numerical Analysis'' and ''Data Mining'', which need the intensive computation power of matrix inversion, e.g. linear regression, PCA, SVM and etc. It will be also useful for many scientific applications, e.g. physics computations, linear algebra, computational fluid dynamics, statistics, graphic rendering and many more.
Currently, several sharedmemory based parallel matrix solutions can provide a scalable and high performance matrix operations, but matrix resources can not be scalable in the term of complexity. The '''Hama''' approach proposes the use of 2dimensional Row and Column(Qualifier) space and multidimensional Columnfamilies of Hbase, which is able to store large sparse and various type of matrices (e.g. Triangular Matrix, 3D Matrix, and etc.). In addition, autopartitioned sparsity substructure will be efficiently managed and serviced by Hbase. Row and Column operations can be done in lineartime, where several algorithms such as structured Gaussian elimination and iterative methods run in O(~the number of nonzero elements in the matrix~ / ~number of mappers (processors/cores)~) time on Hadoop Map/Reduce.
 === Initial Contributors ===
 * Edward Yoon ([mailto:edward@udanax.org edward AT SPAMFREE udanax DOT org])
 * Chanwit Kaewkasi ([mailto:chanwit@gmail.com chanwit AT SPAMFREE gmail DOT com])
 * Min Cha ([mailto:minslovey@gmail.com minslovey AT SPAMFREE gmail DOT com])
 === Initial Source ===
 * http://code.google.com/p/hama/source/checkout
 === Dependencies ===
 * Hadoop (HDFS, Map/Reduce) License: Apache License, 2.0
 * Hbase (Sparse Matrix Table) License: Apache License, 2.0
 
 == Components ==
 === MapReduce In/Out Formatter ===
 * Sparse Matrix
 * Fraction Matrix
 * Triangular Matrix
 === Basic Linear Algebra ===
 * Addition/Substration
 * Multiplication
 * Determinant
 * Cholesky decomposition
 * Crout Decomposition
 * Doolittle Decomposition
 === API & Groovy Support ===
 The '''Hama''' project utilities Groovy for simplification of computational language with generalized the matrix Java Interface.
 For example, we can perform a parallel matrix multiplication by expressing as follows:
 {{{
 Java API :

 Matrix a = Matrix.random(conf, 4, 4);
 Matrix b = Matrix.random(conf, 4, 4);
 Matrix c = a.multiply(b);

 Groovy :
+ * http://wiki.apache.org/incubator/HamaProposal
 def a = rand(10,10)
 def b = rand(10,10)
 def c = a * b
 }}}
 
 == The parallel time complexity of Hama ==
 === Addition/Substraction ===
 * The matrix add/sub requires table full scan twice.
 * The total time spent by these operation is given by O(n^2^/mappers).
 === Multiplication ===
 * The Multiplication requires (n + 1) table full scan irrespective of the number of mapper.
 * Each map processor requires O(n^2^) for the communication and O(n^3^/mappers) the computation.
 
 == References ==
 * ScaLAPACK, a library of highperformance linear algebra routines for distributedmemory messagepassing MIMD computers
 * Scheduling algorithms for parallel Gaussian elimination withcommunication costs, Amoura, A.K.; Bampis, E.; Konig, J.C.
 * High performance numerical libraries in Java, BjørnOve Heimsund
