commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jake Mannix (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MATH-313) Functions could be more object-oriented without losing any power.
Date Sat, 31 Oct 2009 22:21:59 GMT

    [ https://issues.apache.org/jira/browse/MATH-313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772289#action_12772289
] 

Jake Mannix commented on MATH-313:
----------------------------------

Regarding practicality of these abstractions, we could limit the scope of "generalized real-valued
functions" to instead a set of static building blocks:

{code}
public class Functions {
  
  public abstract class ComposableFunction implements UnivariateRealFunction {
  // has all the methods we described implemented, or maybe shortens "preCompose" to "of"
- so you can read it "f of g..." 
  // leaves this abstract:
    abstract double value(double d);
  }

  public static ComposableFunction Exp = new ComposableFunction() { double value(double d)
{ return Math.exp(d); } }
  public static ComposableFunction Sinh = ...
  
  // lots of java.lang.Math functions here, with object-oriented ways to combine them

  public abstract class BinaryFunction {
    abstract double value(double d1, double d2);
    ComposableFunction fix2ndArg(double secondArg) { /*impl*/ }
    ComposableFunction fix1stArg(double firstArg) { /* impl */ }
  }

  public static BinaryFunction Pow = new BinaryFunction() { double value(double d1, double
d2) { return Math.pow(d1, d2); } }
  public static BinaryFunction Log = new BinaryFunction() { double value(double d1, double
d2) { return Math.log(d1, d2); } }
  public static BinaryFunction Max = new BinaryFunction() ...
  public static BinaryFunction Min = ... 
}
{code}

This contains the abstraction within one holder class which has a bunch of functional building
blocks which are easy to use, and doing things like
{code}
  RealVector w = v.map(Exp.of(Negate.of(Pow.fix2ndArg(2))));
{code}
for when you want to map to a gaussian of your vector.

The use for this kind of thing is pretty varied, but in general allows for some really easy
to read and concise stuff, when combined with the Collector paradigm, imagining you have this
interface (with the extra collect methods, instead of just one, because for collecting on
Vectors, you might imagine the Collector doing something different at different index values
- for example, a weighted euclidean dot product, and similarly for matrices):

{code}
public interface UnivariateCollector {
  void collect(double d);
  void collect(int i, double d);
  void collect(int i, int j, double d);
  double result();
}
{code}

This is the interface which gets given to collections of doubles ( like, say, RealVector,
and possibly RealMatrix, which already has a visitor, but it's a mutating visitor ), which
has the following method and implementation:

{code}
public interface DoubleCollection {
  Iterator<DoubleEntry> iterator();
  Iterator<DoubleEntry> sparseIterator();
  double collect(UnivariateCollector collector);
}
{code}

Note I'm not specifically saying this particular interface should exist in this level of generality,
but imagine that these methods are available on AbstractRealVector, at least:

{code}
public abstract class AbstractRealVector implements RealVector, DoubleCollection {

  // leave iterator() and sparseIterator() abstract

  public double collect(UnivariateCollector collector) {
    Iterator<DoubleEntry> it  = // use some logic to decide whether to take sparse or
dense iterator
    DoubleEntry e;
    while(it.hasNext() && (e = it.next()) != null) {
      collector.collect(entry.index(), entry.value());
    }
    return collector.result();
  }

// useful for generalized dot products, kernels, distances and angles:
  public double collect(BivariateCollector collector, RealVector v) {
    // use some logic based on whether this or v is instanceof SparseVector, to decide how
to iterate both of them, then
    some loop {
      collector.collect(index, thisVectorAtIndex, vAtIndex);
    }
    return collector.result();
  }

  public double normL1() { return collect(Abs.asCollector()); }
  public double normLInf() { return collect(Abs.asCollector(Max)); }
  
 // and in general:
  public double normLp(final double p) { Math.pow(collect(Pow.fix2ndArg(p)).asCollector()),
1/p); }

  public double dot(RealVector v) { return collect(Times.asCollector(), v); }

  public RealVector subtract(RealVector v) { return map(Subtract, v); }

  public RealVector ebeMultiply(RealVector v) { return mapToSelf(Multiply, v); } 
  // ditto for all the other ebeXXX methods

  public double distance(RealVector v) {
    return collect(new AbstractBivariateCollector() {
      public void collect(int index, double d1, double d2) { result += Math.pow(d1-d2, 2);
}
    }
  }
  
  // similarly for L1Distance, LInfDistance and in general any Lp distance, and in fact, since
Collector knows
  // what index you're on when collecting, it easily deals with weighted distances, and projected
onto missing 
  // dimension subspaces in particular
{code}

The reason I bring up these kinds of things is that in Machine Learning, in general, you often
want to do fairly arbitrary manipulations on vectors, and you also may want to do arbitrary
combinations of them.  I'm primarily interested in vectors and functions from vectors to reals
(note: MultivariateRealFunction currently only takes double[] arguments, not RealVector -
how to deal with the sparse case, ack!), and vectors to vectors and reals to reals, the the
fairly generic sense, and having to write a ton of boilerplate every time I want to compose
a function, or write a generalized dot product.  If I can't pass in a function to my vector,
I need a method in another class, which doesn't have access to the internals of the vector,
which is usually fine, but in general: Vectors should know how to compute their generalized
distances, lengths, angles, differences, inner products, etc - given a little guidance on
what the specific kind of generalized method they need to use.  

Of course, yes, this can be done fully outside of the linear package: once we have at the
very least access to dense + sparse iterators on RealVector, we can write a whole framework
outside of linear which has DotProduct (defining double dot(RealVector v1, RealVector v2)
), Distance, KernelizedNorm, etc. This can be done, but doing it this way is not my preference,
and dulls my desire to try and help get Commons-Math as the linear library to be used with
Mahout and Decomposer.

> Functions could be more object-oriented without losing any power.
> -----------------------------------------------------------------
>
>                 Key: MATH-313
>                 URL: https://issues.apache.org/jira/browse/MATH-313
>             Project: Commons Math
>          Issue Type: New Feature
>    Affects Versions: 2.0
>         Environment: all
>            Reporter: Jake Mannix
>             Fix For: 2.1
>
>
> UnivariateRealFunction, for example, is a map from R to R.  The set of such functions
has tons and tons of structure: in addition to being an algebra, equipped with +,-,*, and
scaling by constants, it maps the same space into itself, so it is composable, both pre and
post.
> I'd propose we add:
> {code}
>   UnivariateRealFunction plus(UnivariateRealFunction other);
>   UnivariateRealFunction minus(UnivariateRealFunction other);
>   UnivariateRealFunction times(UnivariateRealFunction other);
>   UnivariateRealFunction times(double scale);
>   UnivariateRealFunction preCompose(UnivariateRealFunction other);
>   UnivariateRealFunction postCompose(UnivariateRealFunction other);
> {code}
> to the interface, and then implement them in an AbstractUnivariateRealFunction base class.
 No implementer would need to notice, other than switching to extend this class rather than
implement UnivariateRealFunction.
> Many people don't need or use this, but... it makes for some powerfully easy code:
> {code}UnivariateRealFunction gaussian = Exp.preCompose(Negate.preCompose(Pow2));{code}
> which is even nicer when done anonymously passing into a map/collect method (a la MATH-312).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message