mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitriy Lyubimov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark
Date Mon, 14 Apr 2014 17:59:16 GMT

    [ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968587#comment-13968587
] 

Dmitriy Lyubimov commented on MAHOUT-1464:
------------------------------------------

Running using Spark Client (inside the cluster) is a new thing in 0.9. Assuming it is stable,
it is not supported at this point and going this way will have multiple hurdles. 

for one, mahout spark context requires MAHOUT_HOME to set all mahout binaries properly. The
assumption is one needs Mahout's binaries only on driver's side, but if driver runs inside
remote cluster, this will fail. So our batches should really be started in one of the ways
i described in earlier email. 

Second, i don't think driver can load classes reliably because it includes Mahout dependencies
such as mahout-math. That's another reason why using Client seems problematic to me -- it
assumes one has his _entire_ application within that jar. So not true.

That said, your attempt doesn't exhibit any direct ClassNotFounds and looks more like akka
communication issues i.e. spark setup issues. One thing about Spark is that requires direct
port connectivity not only between cluster nodes but also back to client. In particular it
means your client must not firewall incoming calls and must not be behind NAT. (even port
forwarding doesn't really solve networking issues here). So my first bet would be on akka
connectivity issues between cluster and back to client.




> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark.
This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications
including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message