mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pasmanik, Paul" <Paul.Pasma...@danteinc.com>
Subject RE: running spark-itemsimilarity against HDP sandbox with Spark
Date Tue, 06 Jan 2015 20:30:40 GMT
So, when I follow examples from hortonworks and run spark Pi example using spark-submit - everything
works.
I can run mahout spark-itemsimilarity without specifying master parameter which means it is
running in the local mode (right?) and it works.   But if I try to run mahout using  -ma (master)
parameter to point to yarn cluster it always gets stuck with the following warning:

WARN YarnClusterScheduler: Initial job has not accepted any resources; check your cluster
UI to ensure that workers are registered and have sufficient memory

According to several places the error means that Hadoop does not have sufficient memory  -
but I have plenty and I tried to lower executor-memory and driver-memory all way to 250 Megs.
 I still get that error and nothing is processed.

Did you guys run into this issues?

Thanks.

More stack trace below:

15/01/06 12:14:57 INFO storage.MemoryStore: ensureFreeSpace(4024) called with curMem=87562,
maxMem=2061647216
15/01/06 12:14:57 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 3.9 KB, free 1966.1 MB)
15/01/06 12:14:57 INFO storage.MemoryStore: ensureFreeSpace(2336) called with curMem=91586,
maxMem=2061647216
15/01/06 12:14:57 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory
(estimated size 2.3 KB, free 1966.1 MB)
15/01/06 12:14:57 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on sandbox.hortonworks.com:53919
(size: 2.3 KB, free: 19
66.1 MB)
15/01/06 12:14:57 INFO storage.BlockManagerMaster: Updated info of block broadcast_1_piece0
15/01/06 12:14:57 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[6]
at distinct at TextDelimitedReaderWrite
r.scala:76)
15/01/06 12:14:57 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
15/01/06 12:14:57 INFO util.RackResolver: Resolved sandbox.hortonworks.com to /default-rack
15/01/06 12:15:13 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources;
check your cluster UI to ensure that worker
s are registered and have sufficient memory
15/01/06 12:15:27 WARN cluster.YarnClusterScheduler: Initial job has not accepted any resources;
check your cluster UI to ensure that worker
s are registered and have sufficient memory

-----Original Message-----
From: Pasmanik, Paul [mailto:Paul.Pasmanik@danteinc.com] 
Sent: Tuesday, January 06, 2015 2:49 PM
To: user@mahout.apache.org
Subject: RE: running spark-itemsimilarity against HDP sandbox with Spark

Thanks, Pat.
I am using HDP with spark 1.1.0: http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

Spark examples run without issues.  For mahout I had to create a couple of env vars: (HADOOP_HOME,
SPARK_HOME, MAHOUT_HOME).  Also, to run using yarn cluster with HDP   -ma yarn-cluster needs
to be passed in.   
Also, default memory allocated to yarn was not enough out of the box ( 2g), increased to 3g,
now restarting and trying again.

-----Original Message-----
From: Pat Ferrel [mailto:pat@occamsmachete.com] 
Sent: Tuesday, January 06, 2015 12:58 PM
To: user@mahout.apache.org
Subject: Re: running spark-itemsimilarity against HDP sandbox with Spark

There are some issues with using Mahout on Windows so you’ll have to run on a ‘nix machine
or VM. There shouldn’t be any problem with using VMs as long as your Spark install is setup
correctly.

Currently you have to build Spark first and then Mahout from source. Mahout uses Spark 1.1.
You’ll need to build Spark from source using “mvn install” rather than their recommended
“mvn package” There were some problems in the Spark artifacts when running from the binary
release. Check Mahout’s Spark FAQ for some pointers http://mahout.apache.org/users/sparkbindings/faq.html

Verify Spark is running correctly by trying their sample SparkPi job. 
http://spark.apache.org/docs/1.1.1/submitting-applications.html

Spark in general and spark-itemsimilarity especially like lots of memory so you may have to
play with the -sem option to spark-itemsimilarity.

On Jan 6, 2015, at 8:07 AM, Pasmanik, Paul <Paul.Pasmanik@danteinc.com> wrote:

Hi, I've been trying to run spark-itemsimilarity against Hortonworks Sandbox with Spark running
in a VM, but have not succeeded yet.

Do I need to install mahout and run within a VM or is there a way to run remotely against
a VM where spark and hadoop are running?

I tried running a scala ItemSimilaritySuite test with some modifications pointing hdfs and
spark to sandbox but getting various errors the latest one with ShuffleMapTask getting hdfs
block missing exception trying to read an input file that I uploaded to the hdfs cluster.


________________________________
The information contained in this electronic transmission is intended only for the use of
the recipient and may be confidential and privileged. Unauthorized use, disclosure, or reproduction
is strictly prohibited and may be unlawful. If you have received this electronic transmission
in error, please notify the sender immediately.


Mime
View raw message