From issues-return-761-archive-asf-public=cust-asf.ponee.io@mahout.apache.org Fri Mar 27 16:26:03 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 9C6B1180647 for ; Fri, 27 Mar 2020 17:26:03 +0100 (CET) Received: (qmail 73929 invoked by uid 500); 27 Mar 2020 16:26:03 -0000 Mailing-List: contact issues-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@mahout.apache.org Delivered-To: mailing list issues@mahout.apache.org Received: (qmail 73918 invoked by uid 99); 27 Mar 2020 16:26:03 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Mar 2020 16:26:03 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C41DBE2E39 for ; Fri, 27 Mar 2020 16:26:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id DAB31780571 for ; Fri, 27 Mar 2020 16:26:00 +0000 (UTC) Date: Fri, 27 Mar 2020 16:26:00 +0000 (UTC) From: "Tariq Jawed (Jira)" To: issues@mahout.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAHOUT-2099) Using Mahout as a Library in Spark Cluster MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAHOUT-2099?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D170= 68851#comment-17068851 ]=20 Tariq Jawed commented on MAHOUT-2099: ------------------------------------- [~Andrew_Palumbo]=C2=A0 I am only able to run the application, if I set all= owMultipleContexts to true; and create new SparkContext like this, this set= s the serializer in similar way, which mahout itself sets under=C2=A0mahout= SparkContext() method, but the problem is that I have to allow multiple con= text, is there any other good way; to set the serializers on the existing s= parkContext, which I have been given; or I have to create a new SparkContex= t. =C2=A0 =C2=A0 {code:java} val sparkConf =3D sc.getConf sparkConf.setAppName("MagentoRecommendationEngine").set("spark.serializer", "org.apache.spark.serializer.KryoSerializer").set("spark.kryo.registrator"= , "org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator").set("spark.dri= ver.allowMultipleContexts", "true") implicit val msc: SparkDistributedContext =3D sc2sdc(new SparkContext(spark= Conf)) {code} =C2=A0 > Using Mahout as a Library in Spark Cluster > ------------------------------------------ > > Key: MAHOUT-2099 > URL: https://issues.apache.org/jira/browse/MAHOUT-2099 > Project: Mahout > Issue Type: Question > Components: cooccurrence, Math > Environment: Spark version 2.3.0.2.6.5.10-2 > =C2=A0 > [EDIT] AP > Reporter: Tariq Jawed > Priority: Major > > I have a Spark Cluster already setup, and this is the environment not in = my direct control, but they do allow FAT JARs to be installed with the depe= ndencies. I tried to package my Spark Application with some mahout code for= SimilarityAnalysis, added Mahout library in POM file, and they are success= fully packaged. > The problem however is that I am getting this error while using existing = Spark Context to build Distributed Spark Context for > Mahout > [EDIT]AP: > {code:xml} > pom.xml > {...} > dependency> > org.apache.mahout > mahout-math > 0.13.0 > > > org.apache.mahout > mahout-math-scala_2.10 > 0.13.0 > > > org.apache.mahout > mahout-spark_2.10 > 0.13.0 > > > com.esotericsoftware > kryo > 5.0.0-RC5 > > {code} > =C2=A0 > Code: > {code} > implicit val sc: SparkContext =3D sparkSession.sparkContext > implicit val msc: SparkDistributedContext =3D sc2sdc(sc) > Error: > ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializa= ble result: org.apache.mahout.math.DenseVector > =C2=A0 > And if I try to build the context using mahoutSparkContext() then its giv= ing me the error that MAHOUT_HOME not found.=C2=A0 > Code: > implicit val msc =3D mahoutSparkContext(masterUrl =3D "local", appName = =3D "CooccurrenceDriver") > Error: > MAHOUT_HOME is required to spawn mahout-based spark jobs > =C2=A0{code} > My question is that how do I proceed in this situation? should I have to = ask the administrators of the Spark environment to install Mahout library, = or is there anyway I can proceed packaging my application as fat JAR.=C2=A0 -- This message was sent by Atlassian Jira (v8.3.4#803005)