mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suneel Marthi (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (MAHOUT-1227) Vector.iterateNonZero() is super-clumsy to use: add Iterable<Element> allNonZero()
Date Thu, 23 May 2013 21:43:20 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13665708#comment-13665708
] 

Suneel Marthi edited comment on MAHOUT-1227 at 5/23/13 9:41 PM:
----------------------------------------------------------------

The code in MinhashMapper should  be calling featureVector.iterateNonZero() and that's a bug.
There is a jira and a patch for this - see Mahout-1052.


                
      was (Author: smarthi):
    The code in MinhashMapper should  be calling featureVector.iterateNonZero() and that's
a bug. There was a jira and a patch for this - see Mahout-1052.


                  
> Vector.iterateNonZero() is super-clumsy to use: add Iterable<Element> allNonZero()
> ----------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1227
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1227
>             Project: Mahout
>          Issue Type: Improvement
>         Environment: all
>            Reporter: Andy Schlaikjer
>            Assignee: Jake Mannix
>             Fix For: 0.8
>
>         Attachments: MAHOUT-1227.diff
>
>
> Currently, our codebase is littered with the following:
> {code}
> Iterator<Element> it = vector.iterateNonZero();
> while (it.hasNext()) {
>   Element e = it.next();
>   ...
> {code}
> wouldn't it be nice to be able to do:
> {code}
> for (Element e : vector.allNonZero()) {
>   ...
> {code}
> instead?
> I propose adding an Iterable<Element> allNonZero() which allow this syntactic sugar.
 To make it symmetric with iterateAll, let's also add Iterable<Element> all(), and implement
the simply in AbstractVector.
> The first diff adding this is very non-invasive - new methods added to interface, implemented
in the three classes from which all Vector implementations derive (AbstractVector, NamedVector,
and DelegatingVector).  User code should just work, unless they've implemented their own vector
without subclassing one of these three (yikes).
> Next diff, which is more invasive, would remove "extends Iterable<Element>" from
Vector, because using the foreach of a Vector itself is very rarely what the caller really
means to do (it's the all-iterator, very bad for the more common sparse use case).  To achieve
the same effect, the caller chooses between vector.all() and vector.allNonZero(), and then
they're being crystal clear what they mean.
> Lastly, I'd propose we make iterateAll() and iterateAllNonZero() protected methods on
AbstractVector, so that we are forced to remove all the clumsy places where we do Iterator<Element>
it = ... all throughout the codebase.  I suspect there will be very few places left that really
want the raw iterator, but if there are any, it can be gotten by calling vector.(all/allNonZero).iterator()
> (feature-request/api fix suggestion idea courtesy of Andy Schlaikjer, formalized as a
proposal and posted up here by me, Jake Mannix)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message