Return-Path: Delivered-To: apmail-lucene-mahout-commits-archive@minotaur.apache.org Received: (qmail 91161 invoked from network); 23 Nov 2009 15:16:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Nov 2009 15:16:18 -0000 Received: (qmail 52097 invoked by uid 500); 23 Nov 2009 15:16:18 -0000 Delivered-To: apmail-lucene-mahout-commits-archive@lucene.apache.org Received: (qmail 52034 invoked by uid 500); 23 Nov 2009 15:16:17 -0000 Mailing-List: contact mahout-commits-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mahout-dev@lucene.apache.org Delivered-To: mailing list mahout-commits@lucene.apache.org Received: (qmail 52025 invoked by uid 99); 23 Nov 2009 15:16:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 15:16:17 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Nov 2009 15:16:10 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id BA4792388AB4; Mon, 23 Nov 2009 15:14:45 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r883365 [46/47] - in /lucene/mahout/trunk: ./ examples/ matrix/ matrix/src/ matrix/src/main/ matrix/src/main/java/ matrix/src/main/java/org/ matrix/src/main/java/org/apache/ matrix/src/main/java/org/apache/mahout/ matrix/src/main/java/org/a... Date: Mon, 23 Nov 2009 15:14:38 -0000 To: mahout-commits@lucene.apache.org From: gsingers@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20091123151455.BA4792388AB4@eris.apache.org> Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/SmpBlas.java URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/SmpBlas.java?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/SmpBlas.java (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/SmpBlas.java Mon Nov 23 15:14:26 2009 @@ -0,0 +1,392 @@ +/* +Copyright � 1999 CERN - European Organization for Nuclear Research. +Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose +is hereby granted without fee, provided that the above copyright notice appear in all copies and +that both that copyright notice and this permission notice appear in supporting documentation. +CERN makes no representations about the suitability of this software for any purpose. +It is provided "as is" without expressed or implied warranty. +*/ +package org.apache.mahout.colt.matrix.linalg; + +import org.apache.mahout.colt.matrix.DoubleMatrix1D; +import org.apache.mahout.colt.matrix.DoubleMatrix2D; +import EDU.oswego.cs.dl.util.concurrent.FJTask; +/** +Parallel implementation of the Basic Linear Algebra System for symmetric multi processing boxes. +Currently only a few algorithms are parallelised; the others are fully functional, but run in sequential mode. +Parallelised are: +
    +
  • {@link #dgemm dgemm} (matrix-matrix multiplication)
  • +
  • {@link #dgemv dgemv} (matrix-vector multiplication)
  • +
  • {@link #assign(DoubleMatrix2D, org.apache.mahout.colt.function.DoubleFunction) assign(A,function)} (generalized matrix scaling/transform): Strong speedup only for expensive functions like logarithm, sin, etc.
  • +
  • {@link #assign(DoubleMatrix2D,DoubleMatrix2D, org.apache.mahout.colt.function.DoubleDoubleFunction) assign(A,B,function)} (generalized matrix scaling/transform): Strong speedup only for expensive functions like pow etc.
  • +
+In all cases, no or only marginal speedup is seen for small problem sizes; they are detected and the sequential algorithm is used. + +

Usage

+Call the static method {@link #allocateBlas} at the very beginning of your program, supplying the main parameter for SmpBlas, the number of available CPUs. +The method sets the public global variable SmpBlas.smpBlas to a blas using a maximum of CPUs threads, each concurrently processing matrix blocks with the given sequential blas algorithms. +Normally there is no need to call allocateBlas more than once. +Then use SmpBlas.smpBlas.someRoutine(...) to run someRoutine in parallel. +E.g. + + +
+
+int cpu_s = 4;
+SmpBlas.allocateBlas(cpu_s, SeqBlas.seqBlas);
+...
+SmpBlas.smpBlas.dgemm(...)
+SmpBlas.smpBlas.dgemv(...)
+
+
+Even if you don't call a blas routine yourself, it often makes sense to allocate a SmpBlas, because other matrix library routines sometimes call the blas. +So if you're lucky, you get parallel performance for free. +

Notes

+
    +
  • Unfortunately, there is no portable means of automatically detecting the +number of CPUs on a JVM, so there is no good way to automate defaults.
  • +
  • Only improves performance on boxes with > 1 CPUs and VMs with native threads.
  • +
  • Currently only improves performance when working on dense matrix types. On sparse types, performance is likely to degrade (because of the implementation of sub-range views)!
  • +
  • Implemented using Doug Lea's fast lightweight task framework ({@link EDU.oswego.cs.dl.util.concurrent}) built upon Java threads, and geared for parallel computation.
  • +
+ +@see EDU.oswego.cs.dl.util.concurrent.FJTaskRunnerGroup +@see EDU.oswego.cs.dl.util.concurrent.FJTask +@author wolfgang.hoschek@cern.ch +@version 0.9, 16/04/2000 +*/ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public class SmpBlas implements Blas { + /** + * The public global parallel blas; initialized via {@link #allocateBlas}. + * Do not modify this variable via other means (it is public). + */ + public static Blas smpBlas = SeqBlas.seqBlas; + + protected Blas seqBlas; // blocks are operated on in parallel; for each block this seq algo is used. + + protected Smp smp; + + protected int maxThreads; + + protected static int NN_THRESHOLD = 30000; +/** +Constructs a blas using a maximum of maxThreads threads; each executing the given sequential algos. +*/ +protected SmpBlas(int maxThreads, Blas seqBlas) { + this.seqBlas = seqBlas; + this.maxThreads = maxThreads; + this.smp = new Smp(maxThreads); + //Smp.smp = new Smp(maxThreads); +} +/** +Sets the public global variable SmpBlas.smpBlas to a blas using a maximum of maxThreads threads, each executing the given sequential algorithm; maxThreads is normally the number of CPUs. +Call this method at the very beginning of your program. +Normally there is no need to call this method more than once. +@param maxThreads the maximum number of threads (= CPUs) to be used +@param seqBlas the sequential blas algorithms to be used on concurrently processed matrix blocks. +*/ +public static void allocateBlas(int maxThreads, Blas seqBlas) { + if (smpBlas instanceof SmpBlas) { // no need to change anything? + SmpBlas s = (SmpBlas) smpBlas; + if (s.maxThreads == maxThreads && s.seqBlas == seqBlas) return; + } + + if (maxThreads<=1) + smpBlas = seqBlas; + else { + smpBlas = new SmpBlas(maxThreads, seqBlas); + } +} +public void assign(DoubleMatrix2D A, final org.apache.mahout.colt.function.DoubleFunction function) { + run(A,false, + new Matrix2DMatrix2DFunction() { + public double apply(DoubleMatrix2D AA, DoubleMatrix2D BB) { + seqBlas.assign(AA,function); + return 0; + } + } + ); +} +public void assign(DoubleMatrix2D A, DoubleMatrix2D B, final org.apache.mahout.colt.function.DoubleDoubleFunction function) { + run(A,B,false, + new Matrix2DMatrix2DFunction() { + public double apply(DoubleMatrix2D AA, DoubleMatrix2D BB) { + seqBlas.assign(AA,BB,function); + return 0; + } + } + ); +} +public double dasum(DoubleMatrix1D x) { + return seqBlas.dasum(x); +} +public void daxpy(double alpha, DoubleMatrix1D x, DoubleMatrix1D y) { + seqBlas.daxpy(alpha,x,y); +} +public void daxpy(double alpha, DoubleMatrix2D A, DoubleMatrix2D B) { + seqBlas.daxpy(alpha,A,B); +} +public void dcopy(DoubleMatrix1D x, DoubleMatrix1D y) { + seqBlas.dcopy(x,y); +} +public void dcopy(DoubleMatrix2D A, DoubleMatrix2D B) { + seqBlas.dcopy(A, B); +} +public double ddot(DoubleMatrix1D x, DoubleMatrix1D y) { + return seqBlas.ddot(x,y); +} +public void dgemm(final boolean transposeA, final boolean transposeB, final double alpha, final DoubleMatrix2D A, final DoubleMatrix2D B, final double beta, final DoubleMatrix2D C) { + /* + determine how to split and parallelize best into blocks + if more B.columns than tasks --> split B.columns, as follows: + + xx|xx|xxx B + xx|xx|xxx + xx|xx|xxx + A + xxx xx|xx|xxx C + xxx xx|xx|xxx + xxx xx|xx|xxx + xxx xx|xx|xxx + xxx xx|xx|xxx + + if less B.columns than tasks --> split A.rows, as follows: + + xxxxxxx B + xxxxxxx + xxxxxxx + A + xxx xxxxxxx C + xxx xxxxxxx + --- ------- + xxx xxxxxxx + xxx xxxxxxx + --- ------- + xxx xxxxxxx + + */ + if (transposeA) { + dgemm(false, transposeB, alpha, A.viewDice(), B, beta, C); + return; + } + if (transposeB) { + dgemm(transposeA, false, alpha, A, B.viewDice(), beta, C); + return; + } + int m = A.rows(); + int n = A.columns(); + int p = B.columns(); + + if (B.rows() != n) + throw new IllegalArgumentException("Matrix2D inner dimensions must agree:"+A.toStringShort()+", "+B.toStringShort()); + if (C.rows() != m || C.columns() != p) + throw new IllegalArgumentException("Incompatibel result matrix: "+A.toStringShort()+", "+B.toStringShort()+", "+C.toStringShort()); + if (A == C || B == C) + throw new IllegalArgumentException("Matrices must not be identical"); + + long flops = 2L*m*n*p; + int noOfTasks = (int) Math.min(flops / 30000, this.maxThreads); // each thread should process at least 30000 flops + boolean splitB = (p >= noOfTasks); + int width = splitB ? p : m; + noOfTasks = Math.min(width,noOfTasks); + + if (noOfTasks < 2) { // parallelization doesn't pay off (too much start up overhead) + seqBlas.dgemm(transposeA, transposeB, alpha, A, B, beta, C); + return; + } + + // set up concurrent tasks + int span = width/noOfTasks; + final FJTask[] subTasks = new FJTask[noOfTasks]; + for (int i=0; i sequential + double result = fun.apply(A,B); + if (collectResults) results[0] = result; + return results; + } + else { // parallel + this.smp.run(blocks[0],blocks[1],results,fun); + } + return results; +} +protected double[] run(DoubleMatrix2D A, boolean collectResults, Matrix2DMatrix2DFunction fun) { + DoubleMatrix2D[] blocks; + blocks = this.smp.splitBlockedNN(A, NN_THRESHOLD, A.rows()*A.columns()); + //blocks = this.smp.splitStridedNN(A, NN_THRESHOLD, A.rows()*A.columns()); + int b = blocks!=null ? blocks.length : 1; + double[] results = collectResults ? new double[b] : null; + + if (blocks==null) { // too small -> sequential + double result = fun.apply(A,null); + if (collectResults) results[0] = result; + return results; + } + else { // parallel + this.smp.run(blocks,null,results,fun); + } + return results; +} +/** + * Prints various snapshot statistics to System.out; Simply delegates to {@link EDU.oswego.cs.dl.util.concurrent.FJTaskRunnerGroup#stats}. + */ +public void stats() { + if (this.smp!=null) this.smp.stats(); +} +private double xsum(DoubleMatrix2D A) { + double[] sums = run(A,true, + new Matrix2DMatrix2DFunction() { + public double apply(DoubleMatrix2D AA, DoubleMatrix2D BB) { + return AA.zSum(); + } + } + ); + + double sum = 0; + for (int i = sums.length; --i >= 0; ) sum += sums[i]; + return sum; +} +} Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/SmpBlas.java ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/package.html URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/package.html?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/package.html (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/package.html Mon Nov 23 15:14:26 2009 @@ -0,0 +1,118 @@ + + +

Linear Algebraic matrix computations operating on {@link cern.colt.matrix.DoubleMatrix2D} + and {@link cern.colt.matrix.DoubleMatrix1D}.

+ +

Overview

+ +

The linalg package provides easy and performant access to compute intensive + Linear Algebra. Much functionality is concentrated in class {@link cern.colt.matrix.linalg.Algebra}. + Five fundamental matrix decompositions, which consist of pairs or triples of + matrices, permutation vectors, and the like, produce results in five decomposition + classes.  These decompositions are accessed by the Algebra class + to compute solutions of simultaneous linear equations, determinants, inverses + and other matrix functions.  The five decompositions are

+
    +
  • Cholesky Decomposition of symmetric, positive definite matrices
  • +
  • LU Decomposition (Gaussian elimination) of rectangular matrices
  • +
  • QR Decomposition of rectangular matrices
  • +
  • Eigenvalue Decomposition of both symmetric and nonsymmetric square matrices
  • +
  • Singular Value Decomposition of rectangular matrices
  • +
+

Colt and Jama

+ +

This package could only be rolled out easily because it is to a large degree + adapted from interfaces and implementations of the Jama matrix package. See + the Jama homepage. Due + credit is given to Joe Hicklin, Cleve Moler, Peter Webb, Ronald F. Boisvert, + Bruce Miller, Roldan Pozo and Karin Remington, the Jama authors from MathWorks + and NIST.

+ +

Design Issues

+ +

Jama matrices are of type Jama.Matrix, Colt matrices of type cern.colt.matrix.DoubleMatrix1D, + cern.colt.matrix.DoubleMatrix2D and cern.colt.matrix.DoubleMatrix3D. + +

Jama.Matrix is not a general-purpose array class. It is designed for + a single special purpose: Linear algebra. Because of its limited scope, Jama + can combine data structure and algorithms in a class Jama.Matrix. In + contrast, Colt matrices are general-purpose array classes. Since multi-dimensional + matrices (arrays) have many applications, of which only one is linear algebra, + Colt matrix packages are designed to avoid fat interfaces, yet allow to form + the basis on top of which a broad set of functionality and applications can + be defined (a similar spirit is used in STL and IBM + Array). Thus, data structure and special-purpose algorithms are separated. + Class Algebra works on DoubleMatrix2D and contains the operations + of Jama.Matrix, but holds no data structure. Class DoubleMatrix2D + contains an efficient and flexible multi-dimensional array data structure, as + well as multi-purpose operations, but (almost) no linear algebraic operations. + +

As a consequence a Colt user initially faces some additional complexity, but + after getting used to such a design, will honour the fact that logically related + functionality is logically separated. For example, if a user is not interested + in Formatting, Sorting, Partitioning, Statistics, etc. he/she does not see this + functionality, because it is neither defined in the linalg package nor the matrix + package, but somewhere else. + +

Perhaps more importantly, such a design will scale over time, as more and more + functionality from many scientific and engineering domains is added. Also see + matrix algorithms. + +

Functionality

+ +

All methods of Jama.Matrix are provided in Algebra, except + for some less important convenience methods. Colt matrices (similar to IBM Arrays) + are powerful and flexible data structures. Subrange, slice, dice, flip, selection + and sort views are available for Colt matrices, but not for Jama matrices. (They + are difficult to implement efficiently with Jama matrices, because they + internally use double[][] arrays). + +

Performance

+ +

No extensive performance studies have been carried out so far.
+ Jama matrices weakly encapsulate a normal double[][] array. Dense Colt + matrices strongly encapsulate a double[] array and use some arithmetic + to address cells in 2-d. Addressing a cell is more expensive using double[][] + arrays, due to bounds-checking, null pointer checks, non-contigous memory, and + problems that compilers have to optimize such code. Using double[] + arrays less bounds-checking, less null pointer checks, better cache locality + and better compiler optimizations can be seen, often eliminating bounds-checking + and null-pointer checks, paving the way for effective pipelining. See the publications + of IBM Watson's Ninja project. + +

To improve performance, matrix computations should use highly optimized kernels + in innermost loops. These kernels are not part of class Algebra, but + part of DoubleMatrix2D and DoubleMatrix1D. Otherwise they + couldn't be fully optimized. For example, with some arithmetic (not exposed + to a user), a loop over a 1-d or 2-d matrix can internally reduce cell adressing + overhead. Some of the most critical types of (innermost) loop operations have + a corresponding optimized method in DoubleMatrix2D and DoubleMatrix1D. + For example, dot products, multiplications, assign(function) transforms + and aggregate methods are such internally specialized kernels. Feedback + may result in a few more optimized kernels. Thus, in the name of performance, + in a few cases, algorithms and data structure are not completely separeted. + +

Some internal optimizations have been introduced, in particular for multiplications, + the LU-Decomposition and the Cholesky-Decomposition. The other decomposition + classes are almost identical to the corresponding Jama classes - as such they + are functional but not (yet) particularly efficient. + +

For small matrices, you may be better off using Sun's Java 3D 1.2, see javax.vecmath + - spec and javax.vecmath + javadoc. + +


+ +

Volunteers

+ +

We are looking for volunteers!
+ Do you have a background in Matrix Computations?
+
Do you care about performance and usability?
+ Are you enthusiastic about Open Source?
+ With your help, this package could become better and richer!
+ Contact wolfgang.hoschek@cern.ch + for more info.

+ + \ No newline at end of file Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/linalg/package.html ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Formatter.java URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Formatter.java?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Formatter.java (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Formatter.java Mon Nov 23 15:14:26 2009 @@ -0,0 +1,262 @@ +package org.apache.mahout.colt.matrix.objectalgo; + +/* +Copyright � 1999 CERN - European Organization for Nuclear Research. +Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose +is hereby granted without fee, provided that the above copyright notice appear in all copies and +that both that copyright notice and this permission notice appear in supporting documentation. +CERN makes no representations about the suitability of this software for any purpose. +It is provided "as is" without expressed or implied warranty. +*/ +import org.apache.mahout.colt.matrix.ObjectMatrix1D; +import org.apache.mahout.colt.matrix.ObjectMatrix2D; +import org.apache.mahout.colt.matrix.ObjectMatrix3D; +import org.apache.mahout.colt.matrix.impl.AbstractFormatter; +import org.apache.mahout.colt.matrix.impl.AbstractMatrix1D; +import org.apache.mahout.colt.matrix.impl.AbstractMatrix2D; +import org.apache.mahout.colt.matrix.impl.Former; +/** +Flexible, well human readable matrix print formatting. +Each cell is converted using {@link Object#toString()}. +For examples see {@link org.apache.mahout.colt.matrix.doublealgo.Formatter doublealgo.Formatter} which is just the same except that it operates on doubles. + +@author wolfgang.hoschek@cern.ch +@version 1.1, 11/22/99 +*/ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public class Formatter extends AbstractFormatter { +/** + * Constructs and returns a matrix formatter with alignment LEFT. + */ +public Formatter() { + this(LEFT); +} +/** + * Constructs and returns a matrix formatter. + * @param alignment the given alignment used to align a column. + */ +public Formatter(String alignment) { + setAlignment(alignment); +} +/** + * Converts a given cell to a String; no alignment considered. + */ +protected String form(AbstractMatrix1D matrix, int index, Former formatter) { + return this.form((ObjectMatrix1D) matrix, index, formatter); +} +/** + * Converts a given cell to a String; no alignment considered. + */ +protected String form(ObjectMatrix1D matrix, int index, Former formatter) { + Object value = matrix.get(index); + if (value==null) return ""; + return String.valueOf(value); +} +/** + * Returns a string representations of all cells; no alignment considered. + */ +protected String[][] format(AbstractMatrix2D matrix) { + return this.format((ObjectMatrix2D) matrix); +} +/** + * Returns a string representations of all cells; no alignment considered. + */ +protected String[][] format(ObjectMatrix2D matrix) { + String[][] strings = new String[matrix.rows()][matrix.columns()]; + for (int row=matrix.rows(); --row >= 0; ) strings[row] = formatRow(matrix.viewRow(row)); + return strings; +} +/** + * Returns a string s such that Object[] m = s is a legal Java statement. + * @param matrix the matrix to format. + */ +public String toSourceCode(ObjectMatrix1D matrix) { + Formatter copy = (Formatter) this.clone(); + copy.setPrintShape(false); + copy.setColumnSeparator(", "); + String lead = "{"; + String trail = "};"; + return lead + copy.toString(matrix) + trail; +} +/** + * Returns a string s such that Object[] m = s is a legal Java statement. + * @param matrix the matrix to format. + */ +public String toSourceCode(ObjectMatrix2D matrix) { + Formatter copy = (Formatter) this.clone(); + String b3 = blanks(3); + copy.setPrintShape(false); + copy.setColumnSeparator(", "); + copy.setRowSeparator("},\n"+b3+"{"); + String lead = "{\n"+b3+"{"; + String trail = "}\n};"; + return lead + copy.toString(matrix) + trail; +} +/** + * Returns a string s such that Object[] m = s is a legal Java statement. + * @param matrix the matrix to format. + */ +public String toSourceCode(ObjectMatrix3D matrix) { + Formatter copy = (Formatter) this.clone(); + String b3 = blanks(3); + String b6 = blanks(6); + copy.setPrintShape(false); + copy.setColumnSeparator(", "); + copy.setRowSeparator("},\n"+b6+"{"); + copy.setSliceSeparator("}\n"+b3+"},\n"+b3+"{\n"+b6+"{"); + String lead = "{\n"+b3+"{\n"+b6+"{"; + String trail = "}\n"+b3+"}\n}"; + return lead + copy.toString(matrix) + trail; +} +/** + * Returns a string representation of the given matrix. + * @param matrix the matrix to convert. + */ +protected String toString(AbstractMatrix2D matrix) { + return this.toString((ObjectMatrix2D) matrix); +} +/** + * Returns a string representation of the given matrix. + * @param matrix the matrix to convert. + */ +public String toString(ObjectMatrix1D matrix) { + ObjectMatrix2D easy = matrix.like2D(1,matrix.size()); + easy.viewRow(0).assign(matrix); + return toString(easy); +} +/** + * Returns a string representation of the given matrix. + * @param matrix the matrix to convert. + */ +public String toString(ObjectMatrix2D matrix) { + return super.toString(matrix); +} +/** + * Returns a string representation of the given matrix. + * @param matrix the matrix to convert. + */ +public String toString(ObjectMatrix3D matrix) { + StringBuffer buf = new StringBuffer(); + boolean oldPrintShape = this.printShape; + this.printShape = false; + for (int slice=0; slice < matrix.slices(); slice++) { + if (slice!=0) buf.append(sliceSeparator); + buf.append(toString(matrix.viewSlice(slice))); + } + this.printShape = oldPrintShape; + if (printShape) buf.insert(0,shape(matrix) + "\n"); + return buf.toString(); +} +/** +Returns a string representation of the given matrix with axis as well as rows and columns labeled. +Pass null to one or more parameters to indicate that the corresponding decoration element shall not appear in the string converted matrix. + +@param matrix The matrix to format. +@param rowNames The headers of all rows (to be put to the left of the matrix). +@param columnNames The headers of all columns (to be put to above the matrix). +@param rowAxisName The label of the y-axis. +@param columnAxisName The label of the x-axis. +@param title The overall title of the matrix to be formatted. +@return the matrix converted to a string. +*/ +public String toTitleString(ObjectMatrix2D matrix, String[] rowNames, String[] columnNames, String rowAxisName, String columnAxisName, String title) { + if (matrix.size()==0) return "Empty matrix"; + String oldFormat = this.format; + this.format = LEFT; + + int rows = matrix.rows(); + int columns = matrix.columns(); + + // determine how many rows and columns are needed + int r=0; + int c=0; + r += (columnNames==null ? 0 : 1); + c += (rowNames==null ? 0 : 1); + c += (rowAxisName==null ? 0 : 1); + c += (rowNames!=null || rowAxisName!=null ? 1 : 0); + + int height = r + Math.max(rows, rowAxisName==null ? 0: rowAxisName.length()); + int width = c + columns; + + // make larger matrix holding original matrix and naming strings + org.apache.mahout.colt.matrix.ObjectMatrix2D titleMatrix = matrix.like(height, width); + + // insert original matrix into larger matrix + titleMatrix.viewPart(r,c,rows,columns).assign(matrix); + + // insert column axis name in leading row + if (r>0) titleMatrix.viewRow(0).viewPart(c,columns).assign(columnNames); + + // insert row axis name in leading column + if (rowAxisName!=null) { + String[] rowAxisStrings = new String[rowAxisName.length()]; + for (int i=rowAxisName.length(); --i >= 0; ) rowAxisStrings[i] = rowAxisName.substring(i,i+1); + titleMatrix.viewColumn(0).viewPart(r,rowAxisName.length()).assign(rowAxisStrings); + } + // insert row names in next leading columns + if (rowNames!=null) titleMatrix.viewColumn(c-2).viewPart(r,rows).assign(rowNames); + + // insert vertical "---------" separator line in next leading column + if (c>0) titleMatrix.viewColumn(c-2+1).viewPart(0,rows+r).assign("|"); + + // convert the large matrix to a string + boolean oldPrintShape = this.printShape; + this.printShape = false; + String str = toString(titleMatrix); + this.printShape = oldPrintShape; + + // insert horizontal "--------------" separator line + StringBuffer total = new StringBuffer(str); + if (columnNames != null) { + int i = str.indexOf(rowSeparator); + total.insert(i+1,repeat('-',i)+rowSeparator); + } + else if (columnAxisName != null) { + int i = str.indexOf(rowSeparator); + total.insert(0,repeat('-',i)+rowSeparator); + } + + // insert line for column axis name + if (columnAxisName != null) { + int j=0; + if (c>0) j = str.indexOf('|'); + String s = blanks(j); + if (c>0) s = s + "| "; + s = s+columnAxisName+"\n"; + total.insert(0,s); + } + + // insert title + if (title!=null) total.insert(0,title+"\n"); + + this.format = oldFormat; + + return total.toString(); +} +/** +Returns a string representation of the given matrix with axis as well as rows and columns labeled. +Pass null to one or more parameters to indicate that the corresponding decoration element shall not appear in the string converted matrix. + +@param matrix The matrix to format. +@param sliceNames The headers of all slices (to be put above each slice). +@param rowNames The headers of all rows (to be put to the left of the matrix). +@param columnNames The headers of all columns (to be put to above the matrix). +@param sliceAxisName The label of the z-axis (to be put above each slice). +@param rowAxisName The label of the y-axis. +@param columnAxisName The label of the x-axis. +@param title The overall title of the matrix to be formatted. +@return the matrix converted to a string. +*/ +public String toTitleString(ObjectMatrix3D matrix, String[] sliceNames, String[] rowNames, String[] columnNames, String sliceAxisName, String rowAxisName, String columnAxisName, String title) { + if (matrix.size()==0) return "Empty matrix"; + StringBuffer buf = new StringBuffer(); + for (int i=0; itotal ordering on some + * collection of elements. Comparators can be passed to a sort method (such as + * org.apache.mahout.colt.matrix.objectalgo.Sorting.quickSort) to allow precise control over the sort order.

+ * + * Note: It is generally a good idea for comparators to implement + * java.io.Serializable, as they may be used as ordering methods in + * serializable data structures. In + * order for the data structure to serialize successfully, the comparator (if + * provided) must implement Serializable.

+ * + * @author wolfgang.hoschek@cern.ch + * @version 1.0, 09/24/99 + * @see java.util.Comparator + * @see org.apache.mahout.colt + * @see org.apache.mahout.colt.Sorting + */ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public interface ObjectMatrix1DComparator { +/** + * Compares its two arguments for order. Returns a negative integer, + * zero, or a positive integer as the first argument is less than, equal + * to, or greater than the second.

+ * + * The implementor must ensure that sgn(compare(x, y)) == + * -sgn(compare(y, x)) for all x and y. (This + * implies that compare(x, y) must throw an exception if and only + * if compare(y, x) throws an exception.)

+ * + * The implementor must also ensure that the relation is transitive: + * ((compare(x, y)>0) && (compare(y, z)>0)) implies + * compare(x, z)>0.

+ * + * Finally, the implementer must ensure that compare(x, y)==0 + * implies that sgn(compare(x, z))==sgn(compare(y, z)) for all + * z.

+ * + * + * @return a negative integer, zero, or a positive integer as the + * first argument is less than, equal to, or greater than the + * second. + */ +int compare(ObjectMatrix1D o1, ObjectMatrix1D o2); +/** + * + * Indicates whether some other object is "equal to" this + * Comparator. This method must obey the general contract of + * Object.equals(Object). Additionally, this method can return + * true only if the specified Object is also a comparator + * and it imposes the same ordering as this comparator. Thus, + * comp1.equals(comp2) implies that sgn(comp1.compare(o1, + * o2))==sgn(comp2.compare(o1, o2)) for every element + * o1 and o2.

+ * + * Note that it is always safe not to override + * Object.equals(Object). However, overriding this method may, + * in some cases, improve performance by allowing programs to determine + * that two distinct Comparators impose the same order. + * + * @param obj the reference object with which to compare. + * @return true only if the specified object is also + * a comparator and it imposes the same ordering as this + * comparator. + * @see java.lang.Object#equals(java.lang.Object) + * @see java.lang.Object#hashCode() + */ +boolean equals(Object obj); +} Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix1DComparator.java ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix2DComparator.java URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix2DComparator.java?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix2DComparator.java (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix2DComparator.java Mon Nov 23 15:14:26 2009 @@ -0,0 +1,82 @@ +package org.apache.mahout.colt.matrix.objectalgo; + +/* +Copyright � 1999 CERN - European Organization for Nuclear Research. +Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose +is hereby granted without fee, provided that the above copyright notice appear in all copies and +that both that copyright notice and this permission notice appear in supporting documentation. +CERN makes no representations about the suitability of this software for any purpose. +It is provided "as is" without expressed or implied warranty. +*/ +import org.apache.mahout.colt.matrix.ObjectMatrix2D; +/** + * A comparison function which imposes a total ordering on some + * collection of elements. Comparators can be passed to a sort method (such as + * org.apache.mahout.colt.matrix.objectalgo.Sorting.quickSort) to allow precise control over the sort order.

+ * + * Note: It is generally a good idea for comparators to implement + * java.io.Serializable, as they may be used as ordering methods in + * serializable data structures. In + * order for the data structure to serialize successfully, the comparator (if + * provided) must implement Serializable.

+ * + * @author wolfgang.hoschek@cern.ch + * @version 1.0, 09/24/99 + * @see java.util.Comparator + * @see org.apache.mahout.colt + * @see org.apache.mahout.colt.Sorting + */ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public interface ObjectMatrix2DComparator { +/** + * Compares its two arguments for order. Returns a negative integer, + * zero, or a positive integer as the first argument is less than, equal + * to, or greater than the second.

+ * + * The implementor must ensure that sgn(compare(x, y)) == + * -sgn(compare(y, x)) for all x and y. (This + * implies that compare(x, y) must throw an exception if and only + * if compare(y, x) throws an exception.)

+ * + * The implementor must also ensure that the relation is transitive: + * ((compare(x, y)>0) && (compare(y, z)>0)) implies + * compare(x, z)>0.

+ * + * Finally, the implementer must ensure that compare(x, y)==0 + * implies that sgn(compare(x, z))==sgn(compare(y, z)) for all + * z.

+ * + * + * @return a negative integer, zero, or a positive integer as the + * first argument is less than, equal to, or greater than the + * second. + */ +int compare(ObjectMatrix2D o1, ObjectMatrix2D o2); +/** + * + * Indicates whether some other object is "equal to" this + * Comparator. This method must obey the general contract of + * Object.equals(Object). Additionally, this method can return + * true only if the specified Object is also a comparator + * and it imposes the same ordering as this comparator. Thus, + * comp1.equals(comp2) implies that sgn(comp1.compare(o1, + * o2))==sgn(comp2.compare(o1, o2)) for every element + * o1 and o2.

+ * + * Note that it is always safe not to override + * Object.equals(Object). However, overriding this method may, + * in some cases, improve performance by allowing programs to determine + * that two distinct Comparators impose the same order. + * + * @param obj the reference object with which to compare. + * @return true only if the specified object is also + * a comparator and it imposes the same ordering as this + * comparator. + * @see java.lang.Object#equals(java.lang.Object) + * @see java.lang.Object#hashCode() + */ +boolean equals(Object obj); +} Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/ObjectMatrix2DComparator.java ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Partitioning.java URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Partitioning.java?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Partitioning.java (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Partitioning.java Mon Nov 23 15:14:26 2009 @@ -0,0 +1,394 @@ +package org.apache.mahout.colt.matrix.objectalgo; + +/* +Copyright � 1999 CERN - European Organization for Nuclear Research. +Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose +is hereby granted without fee, provided that the above copyright notice appear in all copies and +that both that copyright notice and this permission notice appear in supporting documentation. +CERN makes no representations about the suitability of this software for any purpose. +It is provided "as is" without expressed or implied warranty. +*/ +import org.apache.mahout.colt.Swapper; +import org.apache.mahout.colt.function.IntComparator; +import org.apache.mahout.colt.matrix.ObjectMatrix1D; +import org.apache.mahout.colt.matrix.ObjectMatrix2D; +/** + * Given some interval boundaries, partitions matrices such that cell values falling into an interval are placed next to each other. + *

+ * Performance + *

+ * Partitioning into two intervals is O( N ). + * Partitioning into k intervals is O( N * log(k)). + * Constants factors are minimized. + * + * @see org.apache.mahout.colt.Partitioning "Partitioning arrays (provides more documentation)" + * + * @author wolfgang.hoschek@cern.ch + * @version 1.0, 09/24/99 + */ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public class Partitioning extends Object { +/** + * Makes this class non instantiable, but still let's others inherit from it. + */ +protected Partitioning() {} +/** +Same as {@link org.apache.mahout.colt.Partitioning#partition(int[],int,int,int[],int,int,int[])} +except that it synchronously partitions the rows of the given matrix by the values of the given matrix column; +This is essentially the same as partitioning a list of composite objects by some instance variable; +In other words, two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Let's say, a "row" is an "object" (tuple, d-dimensional point). +A "column" is the list of "object" values of a given variable (field, dimension). +A "matrix" is a list of "objects" (tuples, points). +

+Now, rows (objects, tuples) are partially sorted according to their values in one given variable (dimension). +Two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Note that arguments are not checked for validity. +

+Example: + + + + + + +
8 x 3 matrix:
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+ 11, 10, 9
+ 8, 7, 6
+ 5, 4, 3
+ 2, 1, 0
+

column = 0;
+ rowIndexes = {0,1,2,..,matrix.rows()-1}; + rowFrom = 0;
+ rowTo = matrix.rows()-1;
+ splitters = {5,10,12}
+ c = 0;
+ d = splitters.length-1;
+ partition(matrix,rowIndexes,rowFrom,rowTo,column,splitters,c,d,splitIndexes);
+ ==>
+ splitIndexes == {0, 2, 3}
+ rowIndexes == {7, 6, 5, 4, 0, 1, 2, 3}

+
+ The matrix IS NOT REORDERED.
+ Here is how it would look
+ like, if it would be reordered
+ accoring to rowIndexes.
+ 8 x 3 matrix:
+ 2, 1, 0
+ 5, 4, 3
+ 8, 7, 6
+ 11, 10, 9
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+@param matrix the matrix to be partitioned. +@param rowIndexes the index of the i-th row; is modified by this method to reflect partitioned indexes. +@param rowFrom the index of the first row (inclusive). +@param rowTo the index of the last row (inclusive). +@param column the index of the column to partition on. +@param splitters the values at which the rows shall be split into intervals. + Must be sorted ascending and must not contain multiple identical values. + These preconditions are not checked; be sure that they are met. + +@param splitFrom the index of the first splitter element to be considered. +@param splitTo the index of the last splitter element to be considered. + The method considers the splitter elements splitters[splitFrom] .. splitters[splitTo]. + +@param splitIndexes a list into which this method fills the indexes of rows delimiting intervals. +Upon return splitIndexes[splitFrom..splitTo] will be set accordingly. +Therefore, must satisfy splitIndexes.length >= splitters.length. +*/ +public static void partition(ObjectMatrix2D matrix, int[] rowIndexes, int rowFrom, int rowTo, int column, final Object[] splitters, int splitFrom, int splitTo, int[] splitIndexes) { + if (rowFrom < 0 || rowTo >= matrix.rows() || rowTo >= rowIndexes.length) throw new IllegalArgumentException(); + if (column < 0 || column >= matrix.columns()) throw new IllegalArgumentException(); + if (splitFrom < 0 || splitTo >= splitters.length) throw new IllegalArgumentException(); + if (splitIndexes.length < splitters.length) throw new IllegalArgumentException(); + + // this one knows how to swap two row indexes (a,b) + final int[] g = rowIndexes; + Swapper swapper = new Swapper() { + public void swap(int b, int c) { + int tmp = g[b]; g[b] = g[c]; g[c] = tmp; + } + }; + + // compare splitter[a] with columnView[rowIndexes[b]] + final ObjectMatrix1D columnView = matrix.viewColumn(column); + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (splitters[a]); + Comparable bv = (Comparable) (columnView.getQuick(g[b])); + int r = av.compareTo(bv); + return r<0 ? -1 : (r==0 ? 0 : 1); + } + }; + + // compare columnView[rowIndexes[a]] with columnView[rowIndexes[b]] + IntComparator comp2 = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (columnView.getQuick(g[a])); + Comparable bv = (Comparable) (columnView.getQuick(g[b])); + int r = av.compareTo(bv); + return r<0 ? -1 : (r==0 ? 0 : 1); + } + }; + + // compare splitter[a] with splitter[b] + IntComparator comp3 = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (splitters[a]); + Comparable bv = (Comparable) (splitters[b]); + int r = av.compareTo(bv); + return r<0 ? -1 : (r==0 ? 0 : 1); + } + }; + + // generic partitioning does the main work of reordering row indexes + org.apache.mahout.colt.Partitioning.genericPartition(rowFrom,rowTo,splitFrom,splitTo,splitIndexes,comp,comp2,comp3,swapper); +} +/** +Same as {@link org.apache.mahout.colt.Partitioning#partition(int[],int,int,int[],int,int,int[])} +except that it synchronously partitions the rows of the given matrix by the values of the given matrix column; +This is essentially the same as partitioning a list of composite objects by some instance variable; +In other words, two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Let's say, a "row" is an "object" (tuple, d-dimensional point). +A "column" is the list of "object" values of a given variable (field, dimension). +A "matrix" is a list of "objects" (tuples, points). +

+Now, rows (objects, tuples) are partially sorted according to their values in one given variable (dimension). +Two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Note that arguments are not checked for validity. +

+Example: + + + + + + +
8 x 3 matrix:
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+ 11, 10, 9
+ 8, 7, 6
+ 5, 4, 3
+ 2, 1, 0
+ column = 0;
+ splitters = {5,10,12}
+ partition(matrix,column,splitters,splitIndexes);
+ ==>
+ splitIndexes == {0, 2, 3}

+
+ The matrix IS NOT REORDERED.
+ The new VIEW IS REORDERED:
+ 8 x 3 matrix:
+ 2, 1, 0
+ 5, 4, 3
+ 8, 7, 6
+ 11, 10, 9
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+@param matrix the matrix to be partitioned. +@param column the index of the column to partition on. +@param splitters the values at which the rows shall be split into intervals. + Must be sorted ascending and must not contain multiple identical values. + These preconditions are not checked; be sure that they are met. + +@param splitIndexes a list into which this method fills the indexes of rows delimiting intervals. +Therefore, must satisfy splitIndexes.length >= splitters.length. + +@return a new matrix view having rows partitioned by the given column and splitters. +*/ +public static ObjectMatrix2D partition(ObjectMatrix2D matrix, int column, final Object[] splitters, int[] splitIndexes) { + int rowFrom = 0; + int rowTo = matrix.rows()-1; + int splitFrom = 0; + int splitTo = splitters.length-1; + int[] rowIndexes = new int[matrix.rows()]; // row indexes to reorder instead of matrix itself + for (int i=rowIndexes.length; --i >= 0; ) rowIndexes[i] = i; + + partition(matrix,rowIndexes,rowFrom,rowTo,column,splitters,splitFrom,splitTo,splitIndexes); + + // take all columns in the original order + int[] columnIndexes = new int[matrix.columns()]; + for (int i=columnIndexes.length; --i >= 0; ) columnIndexes[i] = i; + + // view the matrix according to the reordered row indexes + return matrix.viewSelection(rowIndexes,columnIndexes); +} +/** +Same as {@link #partition(int[],int,int,int[],int,int,int[])} +except that it synchronously partitions the rows of the given matrix by the values of the given matrix column; +This is essentially the same as partitioning a list of composite objects by some instance variable; +In other words, two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Let's say, a "row" is an "object" (tuple, d-dimensional point). +A "column" is the list of "object" values of a given variable (field, dimension). +A "matrix" is a list of "objects" (tuples, points). +

+Now, rows (objects, tuples) are partially sorted according to their values in one given variable (dimension). +Two entire rows of the matrix are swapped, whenever two column values indicate so. +

+Of course, the column must not be a column of a different matrix. +More formally, there must hold:
+There exists an i such that matrix.viewColumn(i)==column. +

+Note that arguments are not checked for validity. +

+Example: + + + + + + +
8 x 3 matrix:
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+ 11, 10, 9
+ 8, 7, 6
+ 5, 4, 3
+ 2, 1, 0
+

column = matrix.viewColumn(0);
+ a = 0;
+ b = column.size()-1;

+ splitters={5,10,12}
+ c=0;
+ d=splitters.length-1;

+ partition(matrix,column,a,b,splitters,c,d,splitIndexes);
+ ==>
+ splitIndexes == {0, 2, 3}

+
8 x 3 matrix:
+ 2, 1, 0
+ 5, 4, 3
+ 8, 7, 6
+ 11, 10, 9
+ 23, 22, 21
+ 20, 19, 18
+ 17, 16, 15
+ 14, 13, 12
+*/ +private static void xPartitionOld(ObjectMatrix2D matrix, ObjectMatrix1D column, int from, int to, Object[] splitters, int splitFrom, int splitTo, int[] splitIndexes) { + /* + Object splitter; // int, Object --> template type dependent + + if (splitFrom>splitTo) return; // nothing to do + if (from>to) { // all bins are empty + from--; + for (int i = splitFrom; i<=splitTo; ) splitIndexes[i++] = from; + return; + } + + // Choose a partition (pivot) index, m + // Ideally, the pivot should be the median, because a median splits a list into two equal sized sublists. + // However, computing the median is expensive, so we use an approximation. + int medianIndex; + if (splitFrom==splitTo) { // we don't really have a choice + medianIndex = splitFrom; + } + else { // we do have a choice + int m = (from+to) / 2; // Small arrays, middle element + int len = to-from+1; + if (len > SMALL) { + int l = from; + int n = to; + if (len > MEDIUM) { // Big arrays, pseudomedian of 9 + int s = len/8; + l = med3(column, l, l+s, l+2*s); + m = med3(column, m-s, m, m+s); + n = med3(column, n-2*s, n-s, n); + } + m = med3(column, l, m, n); // Mid-size, pseudomedian of 3 + } + + // Find the splitter closest to the pivot, i.e. the splitter that best splits the list into two equal sized sublists. + medianIndex = org.apache.mahout.colt.Sorting.binarySearchFromTo(splitters,column.getQuick(m),splitFrom,splitTo); + if (medianIndex < 0) medianIndex = -medianIndex - 1; // not found + if (medianIndex > splitTo) medianIndex = splitTo; // not found, one past the end + + } + splitter = splitters[medianIndex]; + + // Partition the list according to the splitter, i.e. + // Establish invariant: list[i] < splitter <= list[j] for i=from..medianIndex and j=medianIndex+1 .. to + int splitIndex = xPartitionOld(matrix,column,from,to,splitter); + splitIndexes[medianIndex] = splitIndex; + + // Optimization: Handle special cases to cut down recursions. + if (splitIndex < from) { // no element falls into this bin + // all bins with splitters[i] <= splitter are empty + int i = medianIndex-1; + while (i>=splitFrom && (!(splitter < splitters[i]))) splitIndexes[i--] = splitIndex; + splitFrom = medianIndex+1; + } + else if (splitIndex >= to) { // all elements fall into this bin + // all bins with splitters[i] >= splitter are empty + int i = medianIndex+1; + while (i<=splitTo && (!(splitter > splitters[i]))) splitIndexes[i++] = splitIndex; + splitTo = medianIndex-1; + } + + // recursively partition left half + if (splitFrom <= medianIndex-1) { + xPartitionOld(matrix, column, from, splitIndex, splitters, splitFrom, medianIndex-1, splitIndexes); + } + + // recursively partition right half + if (medianIndex+1 <= splitTo) { + xPartitionOld(matrix, column, splitIndex+1, to, splitters, medianIndex+1, splitTo, splitIndexes); + } + */ +} +/** + * Same as {@link #partition(int[],int,int,int)} + * except that it synchronously partitions the rows of the given matrix by the values of the given matrix column; + * This is essentially the same as partitioning a list of composite objects by some instance variable; + * In other words, two entire rows of the matrix are swapped, whenever two column values indicate so. + *

+ * Let's say, a "row" is an "object" (tuple, d-dimensional point). + * A "column" is the list of "object" values of a given variable (field, dimension). + * A "matrix" is a list of "objects" (tuples, points). + *

+ * Now, rows (objects, tuples) are partially sorted according to their values in one given variable (dimension). + * Two entire rows of the matrix are swapped, whenever two column values indicate so. + *

+ * Of course, the column must not be a column of a different matrix. + * More formally, there must hold:
+ * There exists an i such that matrix.viewColumn(i)==column. + * + * Note that arguments are not checked for validity. + */ +private static int xPartitionOld(ObjectMatrix2D matrix, ObjectMatrix1D column, int from, int to, Object splitter) { + /* + Object element; // int, Object --> template type dependent + for (int i=from-1; ++i<=to; ) { + element = column.getQuick(i); + if (element < splitter) { + // swap x[i] with x[from] + matrix.swapRows(i,from); + from++; + } + } + return from-1; + */ + return 0; +} +} Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Partitioning.java ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Sorting.java URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Sorting.java?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Sorting.java (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Sorting.java Mon Nov 23 15:14:26 2009 @@ -0,0 +1,336 @@ +/* +Copyright � 1999 CERN - European Organization for Nuclear Research. +Permission to use, copy, modify, distribute and sell this software and its documentation for any purpose +is hereby granted without fee, provided that the above copyright notice appear in all copies and +that both that copyright notice and this permission notice appear in supporting documentation. +CERN makes no representations about the suitability of this software for any purpose. +It is provided "as is" without expressed or implied warranty. +*/ +package org.apache.mahout.colt.matrix.objectalgo; + +import org.apache.mahout.colt.function.IntComparator; +import org.apache.mahout.colt.matrix.ObjectMatrix1D; +import org.apache.mahout.colt.matrix.ObjectMatrix2D; +import org.apache.mahout.colt.matrix.ObjectMatrix3D; +/** +Matrix quicksorts and mergesorts. +Use idioms like Sorting.quickSort.sort(...) and Sorting.mergeSort.sort(...). +

+This is another case demonstrating one primary goal of this library: Delivering easy to use, yet very efficient APIs. +The sorts return convenient sort views. +This enables the usage of algorithms which scale well with the problem size: +For example, sorting a 1000000 x 10000 or a 1000000 x 100 x 100 matrix performs just as fast as sorting a 1000000 x 1 matrix. +This is so, because internally the algorithms only move around integer indexes, they do not physically move around entire rows or slices. +The original matrix is left unaffected. +

+The quicksort is a derivative of the JDK 1.2 V1.26 algorithms (which are, in turn, based on Bentley's and McIlroy's fine work). +The mergesort is a derivative of the JAL algorithms, with optimisations taken from the JDK algorithms. +Mergesort is stable (by definition), while quicksort is not. +A stable sort is, for example, helpful, if matrices are sorted successively +by multiple columns. It preserves the relative position of equal elements. + +@see org.apache.mahout.colt.GenericSorting +@see org.apache.mahout.colt.Sorting +@see java.util.Arrays + +@author wolfgang.hoschek@cern.ch +@version 1.1, 25/May/2000 +*/ +/** + * @deprecated until unit tests are in place. Until this time, this class/interface is unsupported. + */ +@Deprecated +public class Sorting extends org.apache.mahout.colt.PersistentObject { + /** + * A prefabricated quicksort. + */ + public static final Sorting quickSort = new Sorting(); // already has quicksort implemented + + /** + * A prefabricated mergesort. + */ + public static final Sorting mergeSort = new Sorting() { // override quicksort with mergesort + protected void runSort(int[] a, int fromIndex, int toIndex, IntComparator c) { + org.apache.mahout.colt.Sorting.mergeSort(a,fromIndex,toIndex,c); + } + protected void runSort(int fromIndex, int toIndex, IntComparator c, org.apache.mahout.colt.Swapper swapper) { + org.apache.mahout.colt.GenericSorting.mergeSort(fromIndex, toIndex, c, swapper); + } + }; +/** + * Makes this class non instantiable, but still let's others inherit from it. + */ +protected Sorting() {} +protected void runSort(int[] a, int fromIndex, int toIndex, IntComparator c) { + org.apache.mahout.colt.Sorting.quickSort(a,fromIndex,toIndex,c); +} +protected void runSort(int fromIndex, int toIndex, IntComparator c, org.apache.mahout.colt.Swapper swapper) { + org.apache.mahout.colt.GenericSorting.quickSort(fromIndex, toIndex, c, swapper); +} +/** +Sorts the vector into ascending order, according to the natural ordering. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +To sort ranges use sub-ranging views. To sort descending, use flip views ... +

+Example: + + + + + +
7, 1, 3, 1
+
+

==> 1, 1, 3, 7
+ The vector IS NOT SORTED.
+ The new VIEW IS SORTED.

+
+ +@param vector the vector to be sorted. +@return a new sorted vector (matrix) view. + Note that the original matrix is left unaffected. +*/ +public ObjectMatrix1D sort(final ObjectMatrix1D vector) { + int[] indexes = new int[vector.size()]; // row indexes to reorder instead of matrix itself + for (int i=indexes.length; --i >= 0; ) indexes[i] = i; + + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (vector.getQuick(a)); + Comparable bv = (Comparable) (vector.getQuick(b)); + int r = av.compareTo(bv); + return r<0 ? -1 : (r>0 ? 1 : 0); + } + }; + + runSort(indexes,0,indexes.length,comp); + + return vector.viewSelection(indexes); +} +/** +Sorts the vector into ascending order, according to the order induced by the specified comparator. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +The algorithm compares two cells at a time, determinining whether one is smaller, equal or larger than the other. +To sort ranges use sub-ranging views. To sort descending, use flip views ... +

+Example: +

+// sort by sinus of cells
+ObjectComparator comp = new ObjectComparator() {
+   public int compare(Object a, Object b) {
+      Object as = Math.sin(a); Object bs = Math.sin(b);
+      return as < bs ? -1 : as == bs ? 0 : 1;
+   }
+};
+sorted = quickSort(vector,comp);
+
+ +@param vector the vector to be sorted. +@param c the comparator to determine the order. +@return a new matrix view sorted as specified. + Note that the original vector (matrix) is left unaffected. +*/ +public ObjectMatrix1D sort(final ObjectMatrix1D vector, final java.util.Comparator c) { + int[] indexes = new int[vector.size()]; // row indexes to reorder instead of matrix itself + for (int i=indexes.length; --i >= 0; ) indexes[i] = i; + + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + return c.compare(vector.getQuick(a), vector.getQuick(b)); + } + }; + + runSort(indexes,0,indexes.length,comp); + + return vector.viewSelection(indexes); +} +/** +Sorts the matrix rows into ascending order, according to the natural ordering of the matrix values in the given column. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +To sort ranges use sub-ranging views. To sort columns by rows, use dice views. To sort descending, use flip views ... +

+Example: + + + + + + +
4 x 2 matrix:
+ 7, 6
+ 5, 4
+ 3, 2
+ 1, 0
+
+

column = 0;
+ view = quickSort(matrix,column);
+ System.out.println(view);

+ ==>

+
+

4 x 2 matrix:
+ 1, 0
+ 3, 2
+ 5, 4
+ 7, 6

+ The matrix IS NOT SORTED.
+ The new VIEW IS SORTED.

+
+ +@param matrix the matrix to be sorted. +@param column the index of the column inducing the order. +@return a new matrix view having rows sorted by the given column. + Note that the original matrix is left unaffected. +@throws IndexOutOfBoundsException if column < 0 || column >= matrix.columns(). +*/ +public ObjectMatrix2D sort(ObjectMatrix2D matrix, int column) { + if (column < 0 || column >= matrix.columns()) throw new IndexOutOfBoundsException("column="+column+", matrix="+Formatter.shape(matrix)); + + int[] rowIndexes = new int[matrix.rows()]; // row indexes to reorder instead of matrix itself + for (int i=rowIndexes.length; --i >= 0; ) rowIndexes[i] = i; + + final ObjectMatrix1D col = matrix.viewColumn(column); + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (col.getQuick(a)); + Comparable bv = (Comparable) (col.getQuick(b)); + int r = av.compareTo(bv); + return r<0 ? -1 : (r>0 ? 1 : 0); + } + }; + + runSort(rowIndexes,0,rowIndexes.length,comp); + + // view the matrix according to the reordered row indexes + // take all columns in the original order + return matrix.viewSelection(rowIndexes,null); +} +/** +Sorts the matrix rows according to the order induced by the specified comparator. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +The algorithm compares two rows (1-d matrices) at a time, determinining whether one is smaller, equal or larger than the other. +To sort ranges use sub-ranging views. To sort columns by rows, use dice views. To sort descending, use flip views ... +

+Example: +

+// sort by sum of values in a row
+ObjectMatrix1DComparator comp = new ObjectMatrix1DComparator() {
+   public int compare(ObjectMatrix1D a, ObjectMatrix1D b) {
+      Object as = a.zSum(); Object bs = b.zSum();
+      return as < bs ? -1 : as == bs ? 0 : 1;
+   }
+};
+sorted = quickSort(matrix,comp);
+
+ +@param matrix the matrix to be sorted. +@param c the comparator to determine the order. +@return a new matrix view having rows sorted as specified. + Note that the original matrix is left unaffected. +*/ +public ObjectMatrix2D sort(final ObjectMatrix2D matrix, final ObjectMatrix1DComparator c) { + int[] rowIndexes = new int[matrix.rows()]; // row indexes to reorder instead of matrix itself + for (int i=rowIndexes.length; --i >= 0; ) rowIndexes[i] = i; + + final ObjectMatrix1D[] views = new ObjectMatrix1D[matrix.rows()]; // precompute views for speed + for (int i=views.length; --i >= 0; ) views[i] = matrix.viewRow(i); + + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + //return c.compare(matrix.viewRow(a), matrix.viewRow(b)); + return c.compare(views[a], views[b]); + } + }; + + runSort(rowIndexes,0,rowIndexes.length,comp); + + // view the matrix according to the reordered row indexes + // take all columns in the original order + return matrix.viewSelection(rowIndexes,null); +} +/** +Sorts the matrix slices into ascending order, according to the natural ordering of the matrix values in the given [row,column] position. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +To sort ranges use sub-ranging views. To sort by other dimensions, use dice views. To sort descending, use flip views ... +

+The algorithm compares two 2-d slices at a time, determinining whether one is smaller, equal or larger than the other. +Comparison is based on the cell [row,column] within a slice. +Let A and B be two 2-d slices. Then we have the following rules +

    +
  • A < B iff A.get(row,column) < B.get(row,column) +
  • A == B iff A.get(row,column) == B.get(row,column) +
  • A > B iff A.get(row,column) > B.get(row,column) +
+ +@param matrix the matrix to be sorted. +@param row the index of the row inducing the order. +@param column the index of the column inducing the order. +@return a new matrix view having slices sorted by the values of the slice view matrix.viewRow(row).viewColumn(column). + Note that the original matrix is left unaffected. +@throws IndexOutOfBoundsException if row < 0 || row >= matrix.rows() || column < 0 || column >= matrix.columns(). +*/ +public ObjectMatrix3D sort(ObjectMatrix3D matrix, int row, int column) { + if (row < 0 || row >= matrix.rows()) throw new IndexOutOfBoundsException("row="+row+", matrix="+Formatter.shape(matrix)); + if (column < 0 || column >= matrix.columns()) throw new IndexOutOfBoundsException("column="+column+", matrix="+Formatter.shape(matrix)); + + int[] sliceIndexes = new int[matrix.slices()]; // indexes to reorder instead of matrix itself + for (int i=sliceIndexes.length; --i >= 0; ) sliceIndexes[i] = i; + + final ObjectMatrix1D sliceView = matrix.viewRow(row).viewColumn(column); + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + Comparable av = (Comparable) (sliceView.getQuick(a)); + Comparable bv = (Comparable) (sliceView.getQuick(b)); + int r = av.compareTo(bv); + return r<0 ? -1 : (r>0 ? 1 : 0); + } + }; + + runSort(sliceIndexes,0,sliceIndexes.length,comp); + + // view the matrix according to the reordered slice indexes + // take all rows and columns in the original order + return matrix.viewSelection(sliceIndexes,null,null); +} +/** +Sorts the matrix slices according to the order induced by the specified comparator. +The returned view is backed by this matrix, so changes in the returned view are reflected in this matrix, and vice-versa. +The algorithm compares two slices (2-d matrices) at a time, determinining whether one is smaller, equal or larger than the other. +To sort ranges use sub-ranging views. To sort by other dimensions, use dice views. To sort descending, use flip views ... +

+Example: +

+// sort by sum of values in a slice
+ObjectMatrix2DComparator comp = new ObjectMatrix2DComparator() {
+   public int compare(ObjectMatrix2D a, ObjectMatrix2D b) {
+      Object as = a.zSum(); Object bs = b.zSum();
+      return as < bs ? -1 : as == bs ? 0 : 1;
+   }
+};
+sorted = quickSort(matrix,comp);
+
+ +@param matrix the matrix to be sorted. +@param c the comparator to determine the order. +@return a new matrix view having slices sorted as specified. + Note that the original matrix is left unaffected. +*/ +public ObjectMatrix3D sort(final ObjectMatrix3D matrix, final ObjectMatrix2DComparator c) { + int[] sliceIndexes = new int[matrix.slices()]; // indexes to reorder instead of matrix itself + for (int i=sliceIndexes.length; --i >= 0; ) sliceIndexes[i] = i; + + final ObjectMatrix2D[] views = new ObjectMatrix2D[matrix.slices()]; // precompute views for speed + for (int i=views.length; --i >= 0; ) views[i] = matrix.viewSlice(i); + + IntComparator comp = new IntComparator() { + public int compare(int a, int b) { + //return c.compare(matrix.viewSlice(a), matrix.viewSlice(b)); + return c.compare(views[a], views[b]); + } + }; + + runSort(sliceIndexes,0,sliceIndexes.length,comp); + + // view the matrix according to the reordered slice indexes + // take all rows and columns in the original order + return matrix.viewSelection(sliceIndexes,null,null); +} +} Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/Sorting.java ------------------------------------------------------------------------------ svn:eol-style = native Added: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/package.html URL: http://svn.apache.org/viewvc/lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/package.html?rev=883365&view=auto ============================================================================== --- lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/package.html (added) +++ lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/package.html Mon Nov 23 15:14:26 2009 @@ -0,0 +1,5 @@ + + +Object matrix algorithms such as print formatting, sorting, partitioning and statistics. + + \ No newline at end of file Propchange: lucene/mahout/trunk/matrix/src/main/java/org/apache/mahout/matrix/matrix/objectalgo/package.html ------------------------------------------------------------------------------ svn:eol-style = native