flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christoph Alt <christoph....@posteo.de>
Subject SparseVector.fromCOO keeps zero entries
Date Fri, 08 May 2015 13:20:50 GMT
Hi,

Felix and I are currently working on the implementation of the FeatureHasher (Issue #1735),
which in the end returns a SparseVector.

When using “SparseVector.fromCOO" I’m facing some odd behaviour I haven’t expected.

Assume I create a SparseVector.fromCOO(numFeatures, Map((0, 1.0), (1, 1.0), (1, -1.0))), this
returns a SparseVector((0, 1.0), (1, 0.0)).
I would have expected that after summing up the values of similar indices, an index with a
resulting value of 0.0 would be dropped during the creation of a SparseVector.
Is this the expected behaviour or does this need to be fixed?

Furthermore, are there any plans to extend the SparseVector implementation by a SparseVector.fromArray(),
which takes an array like Array(0.0, 1.0, 2.0, 0.0, 3.2) as parameter and creates a SparseVector((1,
1.0), (2, 2.0), (4, 3.2)) of array.length while only keeping non-zero entries?

Best,
Christoph

Mime
View raw message