Return-Path: Delivered-To: apmail-lucene-hadoop-commits-archive@locus.apache.org Received: (qmail 48673 invoked from network); 28 Aug 2007 06:39:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Aug 2007 06:39:34 -0000 Received: (qmail 96874 invoked by uid 500); 28 Aug 2007 06:39:30 -0000 Delivered-To: apmail-lucene-hadoop-commits-archive@lucene.apache.org Received: (qmail 96843 invoked by uid 500); 28 Aug 2007 06:39:30 -0000 Mailing-List: contact hadoop-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-commits@lucene.apache.org Received: (qmail 96822 invoked by uid 99); 28 Aug 2007 06:39:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2007 23:39:30 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.130] (HELO eos.apache.org) (140.211.11.130) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2007 06:40:14 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id 9C82259A07 for ; Tue, 28 Aug 2007 06:38:59 +0000 (GMT) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: Apache Wiki To: hadoop-commits@lucene.apache.org Date: Tue, 28 Aug 2007 06:38:59 -0000 Message-ID: <20070828063859.3286.65735@eos.apache.org> Subject: [Lucene-hadoop Wiki] Trivial Update of "Hbase/ShellPlans" by udanax X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification. The following page has been changed by udanax: http://wiki.apache.org/lucene-hadoop/Hbase/ShellPlans ------------------------------------------------------------------------------ Altools provides automatic parallelization of the most time-consuming relational/matrix/vector operations, and will ensure that the iterative solvers are scalable. * Parallel Processing of Relational Data - * Parallel Algorithms of Multi-Dimensional Matrix Operation + * Parallel Algorithms of Multi-Dimensional Matrix Operations * Parallel Gaussian Elimination Algorithm ---- @@ -28, +28 @@ == Commands == ||'''Command''' ||'''Explanation''' || - ||Table ||<99%>'''Table''' command loads specified table. [[BR]][[BR]]~-''Table('movieLog_table');''-~ || + ||Table ||<99%>'''Table''' command loads specified table. [[BR]][[BR]]~-''Table('table_name');''-~ || ||Matrix ||<99%>'''Matrix''' command constructs the configuration of the logic matrix.[[BR]]'''Options''' : features not yet. [[BR]][[BR]]~-''Matrix(table_name, columnfamily_name[, option]);''-~ || - ||Substitute ||<99%>'''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('movieLog_table');''-~ || + ||Substitute ||<99%>'''Substitute''' expression to [A~Z][[BR]][[BR]]~-''A = Table('table_name');''-~ || ||IF...ELSE ||<99%>'''IF...ELSE''', Imposes conditions on the execution. [[BR]][[BR]]~-''IF ( boolean_expression )[[BR]]B = command_statements;[[BR]]ELSE[[BR]]B = command_statements;''-~|| - ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('movieLog_table'); [[BR]]B = A.Selection(length > 100); [[BR]]Store B TO table('tmp_table')[or file('backup.dat')];''-~ || + ||Store ||<99%>'''Store''' command will store results to specified table. [[BR]][[BR]]~-''A = Table('table_name'); [[BR]]B = A.Selection(condition_expression); [[BR]]Store B TO table(result_table)[or file('result_file_name')];''-~ || '''Type''' 'help;' for Hbase altools usage. @@ -47, +47 @@ == Relational Operators == ||'''Operator''' ||'''Explanation''' || - ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Projection('year','length'); '''//π,,year.length,,(A)''' ''-~ || + ||Projection ||<99%>'''Projection''' of a relation ~+R+~, It makes a new relation as the set that is obtained when all tuples(rows) in ~+R+~ are restricted to the set {columnfamily,,1,,,...,columnfamily,,n,,}.[[BR]][[BR]]~-''A = Table('table_name');[[BR]]B = A.Projection(column-list); '''//π,,column-list,,(A)''' ''-~ || - ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~.[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = A.Selection(length > 100 AND studioName = 'Fox'); '''//σ,,length > 100.studioName='Fox',,(A)''' ''-~ || + ||Selection ||<99%>'''Selection''' of a relation ~+R+~, It makes a new relation as the set of specified tuples(rows) of the relation ~+R+~.[[BR]]'''Set Operations''' : ~-''OR, AND, NOT''-~[[BR]][[BR]]~-''A = Table('table_name');[[BR]]B = A.Selection(condition_expression); '''//σ,,condition,,(A)''' ''-~ || - ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different internal source.[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct() ''-~ [[BR]][[BR]]~-''R = Table('movieLog_table');[[BR]]S = Table('movieStar_table');[[BR]]C = R.naturalJoin(S); '''//C = R▷◁S''' ''-~ || + ||JOINs ||<99%>Table '''JOIN''' operations, linking and extracting data from two different internal source.[[BR]]'''Operations''' : ~-''naturalJoin(), thetaJoin(), cartesianProduct() ''-~ [[BR]][[BR]]~-''R = Table('table_name1');[[BR]]S = Table('table_name2');[[BR]]C = R.naturalJoin(S); '''//C = R▷◁S''' ''-~ || - ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG( attribute ), SUM( attribute ), COUNT( attribute ), MIN( attribute ), MAX( attribute )''-~[[BR]][[BR]]~-''A = Table('movieLog_table);[[BR]]B = A.Group('studioName', MIN('year')); '''//γ,,studioName.MIN( year ),,(A)''' ''-~ || + ||Group ||<99%>'''Group''' tuples by value of an attribute and apply aggregate function independently to each group of tuples.[[BR]]'''Aggregate Functions''' : ~-''AVG(attribute), SUM(attribute), COUNT(attribute), MIN(attribute), MAX(attribute)''-~[[BR]][[BR]]~-''A = Table('table_name');[[BR]]B = A.Group(column-list); '''//γ,,column-list,,(A)''' ''-~ || - ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list.[[BR]][[BR]]~-''A = Table('movieLog_table');[[BR]]B = Sort A by ('length'); '''//τ,,length,,(A)''' ''-~ || + ||Sort ||<99%>'''Sort''' of tuples(rows) of R, ordered according to columnfamilies on columnfamily-list.[[BR]][[BR]]~-''A = Table('table_name');[[BR]]B = Sort A by (column-list); '''//τ,,column-list,,(A)''' ''-~ || '''(ex. 1)''' Search the subject and the year of the movies which were produced by 'Fox' company and where running time is more than 100 minutes. [[BR]]~-'''''π ,,title.year,, (σ ,,length > 100,, (movieLog_table) ∩ σ ,,studioName = 'Fox',, (movieLog_table))'''''-~ @@ -89, +89 @@ '''Note''' that matrix operations are the core of many linear systems. === Arithmetic Operators === ||'''Operator''' ||'''Explanation''' || - ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A + B; '''// c,,ij,, = a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ || + ||Addition ||<99%>'''Adding''' entries with the same indices. [[BR]][[BR]]~-''A = Matrix('table_name1','columnfamily_name1');[[BR]]B = Matrix('table_name2','columnfamily_name2');[[BR]]C = A + B; '''// c,,ij,, = a,,ij,, + b,,ij,, (i : row key, j : column key)''' ''-~ || - ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A - B; '''// c,,ij,, = a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ || + ||Subtraction ||<99%>'''Subtracting''' entries with the same indices.[[BR]][[BR]]~-''A = Matrix('table_name1','columnfamily_name1');[[BR]]B = Matrix('table_name2','columnfamily_name2');[[BR]]C = A - B; '''// c,,ij,, = a,,ij,, - b,,ij,, (i : row key, j : column key)''' ''-~ || - ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices A and B.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A * B; '''//C = A · B''' ''-~ || + ||Multiplication ||<99%>'''Multiplication''' of two matrices, Product C of two matrices A and B.[[BR]][[BR]]~-''A = Matrix('table_name1','columnfamily_name1');[[BR]]B = Matrix('table_name2','columnfamily_name2');[[BR]]C = A * B; '''//C = A · B''' ''-~ || - ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Matrix('m_table','cf_2');[[BR]]C = A /[or \] B; '''// C = A / B''' ''-~|| + ||Division ||<99%>'''Division''' is solving the matrix equation AX = B for X.[[BR]][[BR]]~-''A = Matrix('table_name1','columnfamily_name1');[[BR]]B = Matrix('table_name2','columnfamily_name2');[[BR]]C = A /[or \] B; '''// C = A / B''' ''-~|| - ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = Transpose(A); '''// B = A'''' ''-~|| + ||Transpose ||<99%>'''Transpose''' of a Matrix, A matrix which is formed by turning all the rows of a given matrix into columns and vice-versa.[[BR]][[BR]]~-''A = Matrix('table_name1','columnfamily_name1');[[BR]]B = Transpose(A); '''// B = A'''' ''-~|| '''(ex. 1)''' Matrix Addition @@ -122, +122 @@ === Factorization and Decomposition Operators === ||'''Function''' ||'''Explanation''' || - ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions''' : ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = LUDecomposition(A);[[BR]]C = B.getU();[[BR]]D = B.getL();''-~|| + ||LU ||<99%>'''LU Decomposition'''[[BR]]A procedure for decomposing an N by N matrix A into a product of a lower triangular matrix L and an upper triangular matrix U, LU = A.[[BR]]'''Functions''' : ~-''getL(), getU(), isSingular(), getPivot()''-~ [[BR]][[BR]]~-''A = Matrix('table_name','columnfamily_name');[[BR]]B = LUDecomposition(A);[[BR]]C = B.getU();[[BR]]D = B.getL();''-~|| - ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = QRDecomposition(A);[[BR]]C = B.getH();''-~|| + ||QR ||<99%>'''QR Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the QR decomposition is an m-by-n orthogonal matrix Q and an n-by-n upper triangular matrix R so that A = Q*R.[[BR]]'''Functions''' : ~-''getH(), getQ(), getR()''-~[[BR]][[BR]]~-''A = Matrix('table_name','columnfamily_name');[[BR]]B = QRDecomposition(A);[[BR]]C = B.getH();''-~|| - ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions''' : ~-''getL(), getU(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = CholeskyDecomposition(A);[[BR]]C = B.getL();''-~|| + ||Cholesky ||<99%>'''Cholesky Decomposition'''[[BR]]It is a special case of LU decomposition applicable only if matrix to be decomposed is symmetric positive definite.[[BR]]'''Functions''' : ~-''getL(), getU(), isSPD()''-~ [[BR]][[BR]]~-''A = Matrix('table_name','columnfamily_name');[[BR]]B = CholeskyDecomposition(A);[[BR]]C = B.getL();''-~|| - ||SVD ||<99%>'''SV(Singular Value) Decomposition'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), norm2(), rank()''-~ [[BR]][[BR]]~-''A = Matrix('m_table','cf_1');[[BR]]B = SVDecomposition(A);[[BR]]C = B.getU();''-~|| + ||SVD ||<99%>'''SVD(Singular Value Decomposition)'''[[BR]]For an m-by-n matrix A with m >= n, the singular value decomposition is an m-by-n orthogonal matrix U, an n-by-n diagonal matrix S, and an n-by-n orthogonal matrix V so that A = U*S*V'.[[BR]]'''Functions''' : ~-''getS(), getU(), getV(), norm2(), rank()''-~ [[BR]][[BR]]~-''A = Matrix('table_name','columnfamily_name');[[BR]]B = SVDdecomposition(A);[[BR]]C = B.getU();''-~|| {{{ //Set up the matrix M from mapped matrix in hbase. @@ -139, +139 @@ [[BR]]~-'''''M = UΣV*'''''-~ {{{ - Hbase.altools > A = M.SVDecomposition(); + Hbase.altools > A = M.SVDdecomposition(); Hbase.altools > U = A.getU(); Hbase.altools > S = A.getS(); Hbase.altools > V = A.getV(); @@ -171, +171 @@ = Example Of Altools Use = - == Latent Semantic Analysis by SVD == + == LSI by SVD == + Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. + + LSA was patented in 1988 by Scott Deerwester, Susan Dumais, George Furnas, Richard Harshman, Thomas Landauer, Karen Lochbaum and Lynn Streeter. In the context of its application to information retrieval, it is sometimes called latent semantic indexing (LSI). + + This example used SVD decomposition with k=3. + [[BR]]''-- The SVD is typically computed using large matrix methods (for example, Lanczos methods) but may also be computed incrementally and with greatly reduced resources via a neural network-like approach which does not require the large, full-rank matrix to be held in memory'' + * ~-'''NOTATION'''-~ * ~-''T,,0,,'' : orthogonal, unit-length columns-~ * ~-''D,,0,,'' : orthogonal, unit-length columns-~ @@ -211, +218 @@ {{{ //Diagonal eigenvalue S - Hbase.altools > M = W.SVDecomposition(); + Hbase.altools > M = W.SVDdecomposition(); Hbase.altools > S = M.getS(); }}} @@ -234, +241 @@ * ~-''X′ = TSDT ≈ X''-~ which is the rank-k model with the best possible least square fit to X. - - This example used (k = 3). = Papers = * [http://www.uib.no/People/nmabh/art/hpj.pdf High performance numerical libraries in Java]