mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent
Date Sat, 17 Apr 2010 23:54:36 GMT
How about this alternative:

NamedVector: {Vector: wrapped, String: name}
Vector: AbstractVector
AbstractVector: DenseVector | SequentialSparseVector | HashSparseVector

This avoids the multiplicative explosion of vector types.



On Sat, Apr 17, 2010 at 4:17 PM, Robin Anil <robin.anil@gmail.com> wrote:

> Agreed. Thats the correct way to go. But like I said, It warrants a
> complete
> overhaul and a separate JIRA issue. The quick fix I indicated ( i.e.
> putting
> the ID back in but removing it from compare/equals function) was just for
> this bug.
>
> How does this structuring sound?
>
> Vector(Interface) -> AbstractVector - > Dense|SparseVector
> -> NamedDense|SparseVector OR LabelledDense|SparseVector  OR
> MultiLabelledDense|SparseVector
>
>
>
> Robin
>
> On Sun, Apr 18, 2010 at 4:21 AM, Ted Dunning <ted.dunning@gmail.com>
> wrote:
>
> > That would be a very, very good thing (uniform data usage).
> >
> > On Sat, Apr 17, 2010 at 2:52 PM, Jake Mannix <jake.mannix@gmail.com>
> > wrote:
> >
> > > Currently, FuzzyKMeansClusterMapper has WritableComparable<?>
> > > keys which are ignored.  Could we instead have the identifier for the
> > > vector live there, where it makes sense?  Then that same key could
> > > be mapper output key, instead of the name of the Vector.
> > >
> > > This kind of change could get the clustering code to effectively be
> > > able to run sensibly on the same
> SequenceFile<IntWritable,VectorWritable>
> > > that DistributedRowMatrix is running on, and that would be very nice,
> > > I think.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message