mahout-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tariq Jawed (Jira)" <>
Subject [jira] [Commented] (MAHOUT-2099) Using Mahout as a Library in Spark Cluster
Date Fri, 27 Mar 2020 16:26:00 GMT


Tariq Jawed commented on MAHOUT-2099:

[~Andrew_Palumbo]  I am only able to run the application, if I set allowMultipleContexts
to true; and create new SparkContext like this, this sets the serializer in similar way, which
mahout itself sets under mahoutSparkContext() method, but the problem is that I have to allow
multiple context, is there any other good way; to set the serializers on the existing sparkContext,
which I have been given; or I have to create a new SparkContext.


val sparkConf = sc.getConf
implicit val msc: SparkDistributedContext = sc2sdc(new SparkContext(sparkConf))

> Using Mahout as a Library in Spark Cluster
> ------------------------------------------
>                 Key: MAHOUT-2099
>                 URL:
>             Project: Mahout
>          Issue Type: Question
>          Components: cooccurrence, Math
>         Environment: Spark version
>            Reporter: Tariq Jawed
>            Priority: Major
> I have a Spark Cluster already setup, and this is the environment not in my direct control,
but they do allow FAT JARs to be installed with the dependencies. I tried to package my Spark
Application with some mahout code for SimilarityAnalysis, added Mahout library in POM file,
and they are successfully packaged.
> The problem however is that I am getting this error while using existing Spark Context
to build Distributed Spark Context for
> Mahout
> {code:xml}
> pom.xml
> {...}
> dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-math</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-math-scala_2.10</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>org.apache.mahout</groupId>
>  <artifactId>mahout-spark_2.10</artifactId>
>  <version>0.13.0</version>
>  </dependency>
>  <dependency>
>  <groupId>com.esotericsoftware</groupId>
>  <artifactId>kryo</artifactId>
>  <version>5.0.0-RC5</version>
>  </dependency>
>  {code}
> Code:
> {code}
> implicit val sc: SparkContext = sparkSession.sparkContext
> implicit val msc: SparkDistributedContext = sc2sdc(sc)
> Error:
> ERROR TaskSetManager: Task 7.0 in stage 10.0 (TID 58) had a not serializable result:
> And if I try to build the context using mahoutSparkContext() then its giving me the error
that MAHOUT_HOME not found. 
> Code:
> implicit val msc = mahoutSparkContext(masterUrl = "local", appName = "CooccurrenceDriver")
> Error:
> MAHOUT_HOME is required to spawn mahout-based spark jobs
>  {code}
> My question is that how do I proceed in this situation? should I have to ask the administrators
of the Spark environment to install Mahout library, or is there anyway I can proceed packaging
my application as fat JAR. 

This message was sent by Atlassian Jira

View raw message