mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baeriswyl Kuno SBB CFF FFS (Extern)" <kuno.baeris...@sbb.ch>
Subject AW: How to do logical subsetting in Mathout
Date Tue, 21 Jul 2020 06:53:34 GMT
Hallo Andrew,
thanks for your hint.

Yes, that's way I've found too. 

def createIndexMap(x : CheckpointedDrm[Int]) : RDD[(Int, Int)] = {
    val xIndexFiltered = x.rdd
    .filter(r => r._2.get(0) > 0)
    .map(r => r._1)

    xIndexFiltered.zipWithIndex 
    .map(r => (r._1,r._2.toInt))
}

First, I filter the DRM and create a map with old and new indexes, as you mentioned.

By appling joins this index map, I'm can reduce the rows in my DRM according to certain condition,
do some more calculation and map back the newly calculated values to the original DRM.

Like:
def mergeDrm(drmOrig : CheckpointedDrm[Int],drmFiltriert : CheckpointedDrm[Int], indexMapping:
RDD[(Int, Int)]) :  CheckpointedDrm[Int] = {
   drmWrap (
            drmOrig.rdd
            .map(r => Pair(r._1, r._2))
            .leftOuterJoin(indexMapping.map(r => Pair(r._1, r._2)))
            .map(r=> Pair(r._2._2, (r._1, r._2._1)))
            .leftOuterJoin(drmFiltriert.rdd.map(r => Pair(Option(r._1), r._2)))
            .map(r=> (r._2._1._1, r._2._2.getOrElse(r._2._1._2)))
    )
}

Greets

Kuno



-----Urspr√ľngliche Nachricht-----
Von: Andrew Musselman <andrew.musselman@gmail.com> 
Gesendet: Dienstag, 7. Juli 2020 23:16
An: user@mahout.apache.org
Betreff: Re: How to do logical subsetting in Mathout

Kuno, thanks for your note. I don't know of an equivalent function out of the box, but if
you want to get the indices where a condition is true you could try something in Scala like:

myList.zipWithIndex.collect { case (item, index) if item > 1 => index }

Hope this is helpful.

On Wed, Jun 10, 2020 at 2:53 AM Baeriswyl Kuno SBB CFF FFS (Extern) < kuno.baeriswyl@sbb.ch>
wrote:

> Hi all,
>
> I've pumped into the Mahout, because I need to migrate a R Script 
> including matric algebra to Spark Cluster.
>
> Mahouts Scala/Spark Binding provides all of the operations, except of 
> logical subsetting.
>
> Example:
>
> x1 = c(1.0,4.0,2.0,5.0)
> x2 = c(0,0,0,0)
> x2[x1 > 1] = 2
>
> Would set value 2 to return Row 2,3 and 4.
>
> Is there an equivalent function in Mahout?
>
>
> Thanks.
>
> Kuno
>
>
Mime
View raw message