Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 35113 invoked from network); 28 Aug 2007 06:03:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Aug 2007 06:03:01 -0000 Received: (qmail 62922 invoked by uid 500); 28 Aug 2007 06:02:56 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 62900 invoked by uid 500); 28 Aug 2007 06:02:56 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 62891 invoked by uid 99); 28 Aug 2007 06:02:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2007 23:02:56 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2007 06:03:40 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 0347459A07 for ; Tue, 28 Aug 2007 06:02:25 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Tue, 28 Aug 2007 06:02:25 -0000 Message-ID: <20070828060225.23757.38029@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans ------------------------------------------------------------------------------ == Background == I expect Hadoop + Hbase to handle sparsity and data explosion very well in near future. Moreover, i believe the design of the multi-dimensional map structure and the 3d space model of the data are optimized for rapid ad-hoc information retrieval in any orientation, as well as for fast, flexible calculation and transformation of raw data based on formulaic relationships. It is advantageous with respect to Analysis Processing as it allows users to easily formulate complex queries, and filter or slice data into meaningful subsets, among other things. + + == Parallelization == + + Altools provides automatic parallelization of the most time-consuming relational/matrix/vector operations, and will ensure that the iterative solvers are scalable. + + * Parallel Processing of Relational Data + * Parallel Algorithms of Multi-Dimensional Matrix Operation + * Parallel Gaussian Elimination Algorithm ---- @@ -77, +85 @@ Hbase.altools > store B to table('result_table'); }}} + == Matrix Operators == + '''Note''' that matrix operations are the core of many linear systems. - == Matrix Arithmetic Operators == + === Arithmetic Operators === ||'''Operator''' ||'''Explanation''' || ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B; '''// c,,ij,, = a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ || ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A - B; '''// c,,ij,, = a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ || @@ -115, +125 @@ ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions''' : ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = LUDecomposition(A);[[BR]]C = B.getU();[[BR]]D = B.getL();''-~|| ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = B.getH();''-~|| ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions''' : ~-''getL(), getU(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C = B.getL();''-~|| - ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = B.getU();''-~|| + ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), norm2(), rank()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = B.getU();''-~|| {{{ //Set up the matrix M from mapped matrix in hbase. @@ -170, +180 @@ * ~-''d'' : Column number of matrix X-~ * ~-''m'' : Rank of matrix X ( ≤ min(t,d) )-~ - ||Row Key ||<-6>Column Families || ||term ||<-2> corpus: ||<-2>link: ||<-2>.. || ||t01 ||corpus:c01 ||0 ||link:cnn.com ||1 ||.. ||.. || @@ -184, +193 @@ Hbase.altools > X = Matrix('t_table','corpus'); }}} + This example used tf*idf for more efficient and economical calculations. + * ~-''w,,ij,, = tf,,ij,, '''·''' log( N/df,,j,, )''-~ + * ~-''w'' : weighted value-~ + * ~-''tf'' : the number of times the word appears in a i-th document-~ + * ~-''N'' : the number of documents of an entire corpus-~ + * ~-''df'' : the number of documents containing the j-th term-~ || ||c01 ||c02 ||.. || ||t01 ||0.00 ||0.00 ||.. || ||t02 ||2.08 ||0.00 ||.. || ||.. ||.. ||.. ||.. || + Weighted Matrix W + {{{ //Diagonal eigenvalue S - Hbase.altools > M = X.SVDecomposition(); + Hbase.altools > M = W.SVDecomposition(); Hbase.altools > S = M.getS(); }}} @@ -207, +224 @@ Hbase.altools > D = M.getV(); }}} - Dimension reduction (k = 3) + SVD allows a simple strategy for optimal approximate fit using smaller matrices. If the singular + values in S0 are ordered by size, the first k largest values may be kept and the remaining smaller ones + set to zero. The product of the resulting matrices is a matrix X′ which is only approximately equal to X, + and is of rank k. Since zeros were introduced into S0, the representation can be simplified by deleting + the zero rows and columns of S0 to obtain a new diagonal matrix S, and deleting the corresponding + columns of T0 and D0 to obtain T and D respectively. The result is a reduced model : + + * ~-''X′ = TSDT ≈ X''-~ + + which is the rank-k model with the best possible least square fit to X. + + This example used (k = 3). = Papers = * [http://www.uib.no/People/nmabh/art/hpj.pdf High performance numerical libraries in Java]