# mahout-user mailing list archives

##### Site index · List index
Message view
Top
From marco turchi <marco.tur...@gmail.com>
Subject Re: Similarity between sparse vectors
Date Fri, 15 Jul 2011 14:20:14 GMT
```Dear Sean,
thanks a lot for the advices, everything is working perfectly!

Cheers
Marco

On Fri, Jul 15, 2011 at 2:15 PM, Sean Owen <srowen@gmail.com> wrote:

> Cardinality should be set to whatever the logical dimension of the
> vector is -- it shouldn't be arbitrary. It's not like an "initial
> size" of a list. If your'e dealing with vectors that have a
> potentially unbounded maximum dimension, use Integer.MAX_VALUE.
>
> As the name suggests, the implementation you use is for sparse
> vectors, meaning dimensions without value have no representation. It
> would be a pretty poor sparse implementation if these were not true.
> So, no, the cardinality has no direct effect on memory.
>
> On Fri, Jul 15, 2011 at 1:00 PM, marco turchi <marco.turchi@gmail.com>
> wrote:
> > Hi
> > thanks a lot
> >
> > I have also another problem ( :-) ). As I wrote in the previous email,
> I'm
> > using the RandomAccessSparseVector representation to store sparse
> vectors. I
> > need to sum some of them together, so I use the method plus but it seems
> > that it requires the same vector cardinality. I set the initial
> cardinality
> > of each vector to a big value, but I was wondering if it is a huge waste
> of
> > memory or everything is optimized inside the   RandomAccessSparseVector
> > class. In case, is there an optimal way to set the cardinality?
> >
> > Thanks again
> > Marco
> >
> > On Fri, Jul 15, 2011 at 1:50 PM, Sean Owen <srowen@gmail.com> wrote:
> >
> >> This is simply Euclidean distance squared. Take the square root if you
> >> need the simple Euclidean distance.
> >>
> >> On Fri, Jul 15, 2011 at 12:36 PM, marco turchi <marco.turchi@gmail.com>
> >> wrote:
> >> > Dear All,
> >> > I'm a newcomer in Mahout and I'm try to compute the cosine similarity
> >> > between two sparse vectors.
> >> > I have loaded them using the class RandomAccessSparseVector. I notice
> >> that
> >> > there is a method called: getDistanceSquared. Which kind of vector
> >> distance
> >> > is implemented? Is there a method to compute directly this distance?
> >> >