Return-Path: Delivered-To: apmail-commons-dev-archive@www.apache.org Received: (qmail 51249 invoked from network); 14 Oct 2009 06:45:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Oct 2009 06:45:43 -0000 Received: (qmail 36280 invoked by uid 500); 14 Oct 2009 06:45:42 -0000 Delivered-To: apmail-commons-dev-archive@commons.apache.org Received: (qmail 36157 invoked by uid 500); 14 Oct 2009 06:45:41 -0000 Mailing-List: contact dev-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Developers List" Delivered-To: mailing list dev@commons.apache.org Received: (qmail 36147 invoked by uid 99); 14 Oct 2009 06:45:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Oct 2009 06:45:41 +0000 X-ASF-Spam-Status: No, hits=-4.1 required=5.0 tests=AWL,BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-px0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Oct 2009 06:45:38 +0000 Received: by pxi8 with SMTP id 8so836808pxi.27 for ; Tue, 13 Oct 2009 23:45:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :from:date:message-id:subject:to:content-type; bh=lCJ6gKkLbp3MMePFOzVnNawxvQPlMwv0FmD1wwyh+q4=; b=kAi5HXKJpH/9PcKZge8b3p+y+uaBsuW5qSX8fJbDwAYa6GX2jRYA+m0NdYGh3OC9aP 2nXgNEIK5yESI5KdplIcwXwVW0Rt3FIf01c/GeeZpm+B8ARQOT36fAuWNlwWxQmeXO+V jUBLG6eDaY5P71GXVa5Rv7Xdk6AkQRqy7zglY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=EiZIrpxNt/ADNI2blDE7lsqr/fqJjwhdltrZUSz42/1c6QTaSl0l2mhsaoayeZOwFQ eOyijDxShIO81lHC/cC3b8laQlivSq+ODBiKQqjuQ9pnVURaYV1JviIJIKjxZ+DQTJyA uchKRyZxemJGei+w1Ydb5p7ZKxQpdq/WF6PE0= MIME-Version: 1.0 Received: by 10.114.164.14 with SMTP id m14mr10686755wae.148.1255502718130; Tue, 13 Oct 2009 23:45:18 -0700 (PDT) In-Reply-To: <4b124c310910132250s3138f09xd18d80102e981caa@mail.gmail.com> References: <4b124c310910132250s3138f09xd18d80102e981caa@mail.gmail.com> From: Ted Dunning Date: Tue, 13 Oct 2009 23:44:58 -0700 Message-ID: Subject: Re: [math] Questions about the linear package To: Commons Developers List Content-Type: multipart/alternative; boundary=0016367f92928758f80475df8163 --0016367f92928758f80475df8163 Content-Type: text/plain; charset=UTF-8 I would like to add my voice as a Mahout committer. We would LOVE to use commons math in Mahout, but these and a few other issues prevent it. There was word some time ago about integrating a high performance linear package such as MTJ into math. Is that stalled? On Tue, Oct 13, 2009 at 10:50 PM, Jake Mannix wrote: > Greetings, commons-math! > > I've been looking at a variety of apache/bsd-licensed linear libraries for > use in massively parallel machine-learning applications I've been working > on > (I am housing my own open-source library at > http://decomposer.googlecode.com, > and am looking at integrating with/using/contributing to Apache Mahout), > and > I'm wondering a little about the linear API there is here in commons-math: > > * also for RealVector - No iterator methods? So if the implementation is > sparse, there's no way to just iterate over the non-zero entries? What's > worse, you can't even subclass OpenMapVector and expose the iterator on the > OpenIntToDoubleHashMap inner object, because it's private. :\ > > * for RealVector - what's with the million-different methods mapXXX(), > mapXXXtoSelf()? Why not just map(UnaryFunction()), and > mapToSelf(UnaryFunction()), where UnaryFunction defines the single method > double apply(double d); ? Any user who wishes to implement RealVector (to > say, make a more efficient specialized SparseVector) has to go through the > pain of writing up a million methods dealing with these (and even if > copy/paste gets most of this, it still leads to some horribly huge .java > files filled with junk that does not appear to be used). There does not > even appear to be an AbstractRealVector which implements all of these for > you (by using the above-mentioned iterator() ). > > * while we're at it, if there is map(), why not also double > RealVector.collect(Collector()), where Collector defines void collect(int > index, double value); and double result(); - this can be used for generic > inner products and kernels (and can allow for consolidating all of the > L1Norm(), norm(), and LInfNorm() methods into this same method, passing in > different L1NormCollector() etc... instances). > > * why all the methods which are overloaded to take either RealVector or > double[] (getDistance, dotProduct, add, etc...) - is there really that much > overhead in just implementing dotProduct(double[] d) as just > dotProduct(new > ArrayRealVector(d, false)); - no copy is done, nothing is done but one > object creation... > > * SparseVector is just a marker interface? Does it serve any purpose? > > I guess I could ask similar questions on the Matrix interfaces, but maybe > those will probably be cleared up by understanding the philosophy behind > the > Vector interfaces. > > I'd love to use commons-math for parts of my projects in which the entire > data sets can live in memory (often part of the computation falls into this > category, even if it's not the most meaty part, it's big enough that I'll > kill my performance if I am stuck writing my own subroutines for eigen > computation, etc for many moderately small matrices), but converting two > and > from the commons-math linear interfaces seem a bit unweildy. Maybe it > would > be easier if I could understand why these are the way they are. > > I'm happy to contribute patches consolidating interfaces and/or extending > functionality (you seem to be missing a compact int/double pair > implementation of sparse vectors, for example, which are a fantasticly > performant format if they're immutable and only being used for dot products > and adding them to dense vectors), if it would be of help (I'm tracking my > attempts at this over on my GitHub clone of trunk: > http://github.com/jakemannix/commons-math ). > > -jake mannix > Principal Software Engineer > Search and Recommender Systems > LinkedIn.com > -- Ted Dunning, CTO DeepDyve --0016367f92928758f80475df8163--