mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robin Anil <robin.a...@gmail.com>
Subject Re: AbstractVector.minus(Vector)
Date Mon, 19 Apr 2010 16:34:07 GMT
On Mon, Apr 19, 2010 at 9:43 PM, Sean Owen <srowen@gmail.com> wrote:

> More on Vector, as I'm browsing through it:
>
> AbstractVector.minus(Vector) says:
>
>  public Vector minus(Vector x) {
>    if (size() != x.size()) {
>      throw new CardinalityException();
>    }
>    if (x instanceof RandomAccessSparseVector || x instanceof DenseVector) {
>      // TODO: if both are RandomAccess check the numNonDefault to
> determine which to iterate
>      Vector result = x.clone();
>      Iterator<Element> iter = iterateNonZero();
>      while (iter.hasNext()) {
>        Element e = iter.next();
>        result.setQuick(e.index(), e.get() - result.getQuick(e.index()));
>      }
>      return result;
>    } else { // TODO: check the numNonDefault elements to further optimize
>      Vector result = clone();
>      Iterator<Element> iter = x.iterateNonZero();
>      while (iter.hasNext()) {
>        Element e = iter.next();
>        result.setQuick(e.index(), getQuick(e.index()) - e.get());
>      }
>      return result;
>    }
>  }
>
>
> The stanza after the instanceof checks can just become the body of an
> overriding method in these two subclasses right?
>
> Since we're computing "this - that", makes sense to only look at
> "that" where it is nonzero. But the first version iterates over
> indices where "this" is nonzero, so it's wrong. (Yeah just checked
> with a test.)
>
> Was the intent to compute "that - this" in this case, so to be able to
> iterate over nonzero elements of "this", and then invert it at the
> end? This works:
>
>  @Override
>  public Vector minus(Vector that) {
>    if (this.size() != that.size()) {
>      throw new CardinalityException();
>    }
>    Vector result = that.clone();
>    Iterator<Element> iter = this.iterateNonZero();
>    while (iter.hasNext()) {
>      Element thisElement = iter.next();
>      int index = thisElement.index();
>      result.setQuick(index, that.getQuick(index) -
> thisElement.get()); // this is "-(that - this)"
>    }
>    return result.times(-1.0);
>  }
>
> It's nice but involves another Vector allocation at the end.
>
> But then I'm also confused since this is the version intended for
> DenseVector and RandomAccessSparseVector. I'd imagine it's best used
> with SequentialAccessSparseVector, where "iterateNonZero()" has the
> most benefit?
>
>
> But that's only an issue to optimize if you iterate over "this", like
> in my update above. Could we just have one implementation that
> iterates over "that"? It's more straightforward. The issue isn't
> really implementation but number of non-zero elements in "this" versus
> "that", as the TODO comments point out.

You can test the minus performance using the Benchmark tool in utils. I
believe this optimization was done because, inserting or editing a
sequential access vector is very expensive compared to others.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message