Return-Path: X-Original-To: apmail-mahout-dev-archive@www.apache.org Delivered-To: apmail-mahout-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4E49710ECD for ; Mon, 14 Apr 2014 17:59:34 +0000 (UTC) Received: (qmail 68150 invoked by uid 500); 14 Apr 2014 17:59:17 -0000 Delivered-To: apmail-mahout-dev-archive@mahout.apache.org Received: (qmail 68055 invoked by uid 500); 14 Apr 2014 17:59:16 -0000 Mailing-List: contact dev-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list dev@mahout.apache.org Received: (qmail 68043 invoked by uid 99); 14 Apr 2014 17:59:16 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 14 Apr 2014 17:59:16 +0000 Date: Mon, 14 Apr 2014 17:59:16 +0000 (UTC) From: "Dmitriy Lyubimov (JIRA)" To: dev@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968587#comment-13968587 ] Dmitriy Lyubimov commented on MAHOUT-1464: ------------------------------------------ Running using Spark Client (inside the cluster) is a new thing in 0.9. Assuming it is stable, it is not supported at this point and going this way will have multiple hurdles. for one, mahout spark context requires MAHOUT_HOME to set all mahout binaries properly. The assumption is one needs Mahout's binaries only on driver's side, but if driver runs inside remote cluster, this will fail. So our batches should really be started in one of the ways i described in earlier email. Second, i don't think driver can load classes reliably because it includes Mahout dependencies such as mahout-math. That's another reason why using Client seems problematic to me -- it assumes one has his _entire_ application within that jar. So not true. That said, your attempt doesn't exhibit any direct ClassNotFounds and looks more like akka communication issues i.e. spark setup issues. One thing about Spark is that requires direct port connectivity not only between cluster nodes but also back to client. In particular it means your client must not firewall incoming calls and must not be behind NAT. (even port forwarding doesn't really solve networking issues here). So my first bet would be on akka connectivity issues between cluster and back to client. > Cooccurrence Analysis on Spark > ------------------------------ > > Key: MAHOUT-1464 > URL: https://issues.apache.org/jira/browse/MAHOUT-1464 > Project: Mahout > Issue Type: Improvement > Components: Collaborative Filtering > Environment: hadoop, spark > Reporter: Pat Ferrel > Assignee: Sebastian Schelter > Fix For: 1.0 > > Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh > > > Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM can be used as input. > Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has several applications including cross-action recommendations. -- This message was sent by Atlassian JIRA (v6.2#6252)