mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <>
Subject Re: Using mahout to cluster terms in Lucene
Date Wed, 30 Sep 2009 21:01:57 GMT
The cooccurrence counts themselves form a symmetric matrix.  (A'A)' = A'A
because of the way that matrix multiplication works.

The filtering for anomalous cooccurrence that sparsifies the cooccurrence
can introduce asymmetry as you point out.

The most prominent time that I saw this in practice was in music
recommendations where a fair number of artists linked to high profile bands
such as the Beatles, but the reverse link did not survive the filtering.
You can enforce bi-directionality, but I have usually found that the
asymmetry isn't a problem and often accords with intuitions about the field.

On Wed, Sep 30, 2009 at 12:42 AM, Shashikant Kore <>wrote:

> Some time back I had thought about this idea. But, I sensed one
> potential problem with this approach. The resulting co-occurrence will
> be bi-directional. For document this property is fine, but for terms,
> it may not be desirable in some cases.

Ted Dunning, CTO

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message