Mailing-List: contact user-help@mahout.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@mahout.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C"
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Subject: Re: spark-itemsimilarity can't launch on a Spark cluster?
From: Pat Ferrel <pat@occamsmachete.com>
In-Reply-To: <587D12D4-8DF5-4DA3-9118-28D98997027E@163.com>
Date: Fri, 10 Oct 2014 18:22:16 -0700
Cc: "user@mahout.apache.org" <user@mahout.apache.org>
Message-Id: <CDB4D881-9670-48D4-B9CA-71D192FAA3C1@occamsmachete.com>
References: <42401634-20A2-442F-9F18-B81A4776F5C6@163.com>
 <1C218FF5-06AE-4BA8-BBA2-111B83158067@gmail.com>
 <4BDE7C7A-AF3E-415F-BF24-9C3852D510BA@163.com>
 <93A82EB2-6308-447B-8747-2E108983F6EB@occamsmachete.com>
 <E5A0B0A2-0E10-4503-8479-D0B693EA9F2F@occamsmachete.com>
 <B09744C0-1380-43CB-8CAB-79D39298D8D7@163.com>
 <952233A5-5A82-40E6-83E9-C2FA849B28D5@occamsmachete.com>
 <D2B79F2E-552B-4BC7-8820-81731DB86EE8@163.com>
 <9CAE988C-69BF-4542-B61E-1F817A02A262@occamsmachete.com>
 <BEF5812A-8BE9-4C84-8A28-86A7BC3C7830@163.com>
 <E9417ACF-6662-4C5D-A142-AC7A486196E2@occamsmachete.com>
 <587D12D4-8DF5-4DA3-9118-28D98997027E@163.com>
To: pol <swallow_pulm@163.com>

--Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Did you stop the 1.6g job or did it fail?

I see task failures but no stage failures.


On Oct 10, 2014, at 8:49 AM, pol <swallow_pulm@163.com> wrote:

Hi Pat,
	Yes, spark-itemsimilarity can be work ok, it had been finished =
calculation on 150m dataset.

	The problem above, 1.6g dataset can=92t be finishing =
calculation, I have three machines(16 cores and 16g memory per) for this =
test, the environment can't finish the calculation?
	The dataset had archived one file by hadoop archive tool, such =
as only a machine at processing state. To do so because don=92t archive =
will be coming some error, about information can refer to the =
attachment.
	<spark1.png>

<spark2.png>

<spark3.png>


	If you can, I will provide the test dataset to you.=20

	Thank you again.


On Oct 10, 2014, at 22:07, Pat Ferrel <pat@occamsmachete.com> wrote:

> So it is completing some of the spar-itemsimilarity jobs now? That is =
better at least.
>=20
> Yes. More data means you may need more memory or more nodes in your =
cluster. This is how to scale Spark and Hadoop. Spark in particular =
needs core memory since it tries to avoid disk read/write.
>=20
> Try increasing -sem as fas as you can first then you may need to add =
machines to your cluster tp speed it up. Do you need results faster than =
15 hours.
>=20
> Remember the way the Solr recommender works allows you to make =
recommendations to new users and train less often. The new user data =
does no have to be in the training/indicator data. You train partly =
based on how many new user but partly based on how many new items are =
added to the catalog.
>=20
> A\On Oct 10, 2014, at 1:47 AM, pol <swallow_pulm@163.com> wrote:
>=20
> Hi Pat,
> 	Because of a holiday, now just reply.
>=20
> 	I changed 1.0.2 to 1.0.1 for mahout-1.0-SNAPSHOT, and use Spark =
1.0.1 , Hadoop 2.4.0, spark-itemsimilarity can be work ok. But have a =
new question:
> 	mahout spark-itemsimilarity -i /view_input,/purchase_input -o =
/output -os -ma spark://recommend1:7077 -sem 15g -f1 purchase -f2 view =
-ic 2 -fc 1 -m 36
>=20
> 	When "view" data:1.6g and "purchase" data:60m, this shell 15 =
hours are not performed("indicator-matrix" had computed, and =
"cross-indicator-matrix" computing), but "view" data:100m finished 2 =
minutes to perform, this is the reason of data?
>=20
>=20
> On Oct 1, 2014, at 01:10, Pat Ferrel <pat@occamsmachete.com> wrote:
>=20
>> This will not be fixed in Mahout 1.0 unless we can find a problem in =
Mahout now. I am the one who would fix it. At present it looks to me =
like a Spark version or setup problem.
>>=20
>> These errors seem to indicate that the build or setup have a =
problems. It seems that you cannot use Spark 1.10. Set up your cluster =
to use mahout-1.0-SNAPSHOT with pom set to back to spark-1.0.1, Spark =
1.0.1 build for Hadoop 2.4, and Hadoop 2.4. This is the only combination =
that is supposed to work together.
>>=20
>> If this still fails it may be a setup problems since I can run on a =
cluster just fine with my setup. When you get an error from this config =
send it to me and the Spark user list to see if they can give us a clue.
>>=20
>> Question: Do you have mahout-1.0-SNAPSHOT and spark installed on all =
your cluster machines, with the correct environment variables and path?
>>=20
>>=20
>> On Sep 30, 2014, at 12:47 AM, pol <swallow_pulm@163.com> wrote:
>>=20
>> Hi Pat,=20
>> 	It=92s problem for Spark version, but spark-itemsimilarity is =
still can't the completion of normal.
>>=20
>> 1. Change 1.0.1 to 1.1.0 at mahout-1.0-SNAPSHOT/pom.xml, Spark =
version compatibility is no problem, but the program has a problem:
>> --------------------------------------------------------------
>> 14/09/30 11:26:04 WARN scheduler.TaskSetManager: Lost task 1.0 in =
stage 10.1 (TID 31, Hadoop.Slave1): java.lang.NoClassDefFoundError: =20
>>         org/apache/commons/math3/random/RandomGenerator
>>         =
org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65)
>>         =
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAn=
alysis.scala:228)
>>         =
org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAn=
alysis.scala:223)
>>         =
org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.sc=
ala:33)
>>         =
org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.sc=
ala:32)
>>         scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>         =
scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>>         =
org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
>>         =
org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
>>         =
org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:227)
>>         org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>>         =
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         =
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         =
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         =
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>>         =
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>>         =
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.=
java:886)
>>         =
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java=
:908)
>>         java.lang.Thread.run(Thread.java:662)
>> --------------------------------------------------------------
>> I tried to add commons-math3-3.2.jar to mahout-1.0-SNAPSHOT/lib, but =
still the same. (It not directly use the RandomGenerator at =
RandomUtils.java:65)
>>=20
>>=20
>> 2. Change 1.0.1 to 1.0.2 at mahout-1.0-SNAPSHOT/pom.xml, there are =
still other errors:
>> --------------------------------------------------------------
>> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Lost TID 427 (task =
7.0:51)
>> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Loss was due to =
java.lang.ClassCastException
>> java.lang.ClassCastException: scala.Tuple1 cannot be cast to =
scala.Tuple2
>>         at =
org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$4.apply(TextDeli=
mitedReaderWriter.scala:75)
>>         at =
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>         at =
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>         at =
org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59)
>>         at =
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.sc=
ala:96)
>>         at =
org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.sc=
ala:95)
>>         at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594)
>>         at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594)
>>         at =
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35)
>>         at =
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>>         at =
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158=
)
>>         at =
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)=

>>         at org.apache.spark.scheduler.Task.run(Task.scala:51)
>>         at =
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183)
>>         at =
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.=
java:886)
>>         at =
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java=
:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> --------------------------------------------------------------
>> Please refer to the attachment for full log.
>> <screenlog_bash.log>
>>=20
>>=20
>>=20
>> In addition, I used 66 files on HDFS than each file in 20 to 30 M,  =
if it is necessary I will provide the data.
>> Shell is : mahout spark-itemsimilarity -i =
/rec/input/ss/others,/rec/input/ss/weblog -o /rec/output/ss -os -ma =
spark://recommend1:7077 -sem 4g -f1 purchase -f2 view -ic 2 -fc 1
>> Spark cluster: 8 workers, 32 cores total, 32G memory total, at two =
machines.
>>=20
>> Feeling a few days are not solved, not as good as waiting for Mahout =
1.0 release version or use mahout item similarity.
>>=20
>>=20
>> Thank you again, Pat.
>>=20
>>=20
>> On Sep 29, 2014, at 00:02, Pat Ferrel <pat@occamsmachete.com> wrote:
>>=20
>>> It looks like the cluster version of spark-itemsimilarity is never =
accepted by the Spark master. it fails in =
TextDelimitedReaderWriter.scala because all work is using =93lazy=94 =
evaluation and until the write no actual work is done on the Spark =
cluster.
>>>=20
>>> However your cluster seems to be working with the Pi example. =
Therefore there must be something wrong with the Mahout build or config. =
Some ideas:
>>>=20
>>> 1) Mahout 1.0-SNAPSHOT is targeted for Spark 1.0.1.  However I use =
1.0.2 and it seems to work. You might try changing the version in the =
pom.xml and do a clean build of Mahout. Change the version number in =
mahout/pom.xml
>>>=20
>>> mahout/pom.xml
>>> -     <spark.version>1.0.1</spark.version>
>>> +    <spark.version>1.1.0</spark.version>
>>>=20
>>> This may not be needed but it is easier than installing Spark 1.0.1.
>>>=20
>>> 2) Try installing and building Mahout on all cluster machines. I do =
this so I can run the Mahout spark-shell on any machine but it may be =
needed. The Mahout jars, path setup, and directory structure should be =
the same on all cluster machines.
>>>=20
>>> 3) Try making -sem larger. I usually make it as large a I can on the =
cluster and try smaller until it affects performance. The epinions =
dataset that I use for testing on my cluster requires -sem 6g.
>>>=20
>>> My cluster has 3 machines with Hadoop 1.2.1 and Spark 1.0.2.  I can =
try running your data through spark-itemsimilarity on my cluster if you =
can share it. I will sign an NDA and destroy it after the test.
>>>=20
>>>=20
>>>=20
>>> On Sep 27, 2014, at 5:28 AM, pol <swallow_pulm@163.com> wrote:
>>>=20
>>> Hi Pat,
>>> 	Thank for your=92s reply. It's still can't work normal, I tested =
it on a Spark standalone cluster, don=92t tested it on a YARN cluster.
>>>=20
>>> First, test the cluster configuration is correct. =
http:///Hadoop.Master:8080 infos:
>>> -----------------------------------
>>> URL: spark://Hadoop.Master:7077
>>> Workers: 2
>>> Cores: 4 Total, 0 Used
>>> Memory: 2.0 GB Total, 0.0 B Used
>>> Applications: 0 Running, 1 Completed
>>> Drivers: 0 Running, 0 Completed
>>> Status: ALIVE
>>> ----------------------------------
>>>=20
>>> Environment
>>> ----------------------------------
>>> OS: CentOS release 6.5 (Final)
>>> JDK: 1.6.0_45
>>> Mahout: mahout-1.0-SNAPSHOT(mvn -Dhadoop2.version=3D2.4.1 =
-DskipTests clean package)
>>> Hadoop: 2.4.1
>>> Spark: spark-1.1.0-bin-2.4.1(mvn -Pyarn -Phadoop-2.4 =
-Dhadoop.version=3D2.4.1 -Phive -DskipTests clean package)
>>> ----------------------------------
>>>=20
>>> Shell:
>>>      spark-submit --class org.apache.spark.examples.SparkPi --master =
spark://Hadoop.Master:7077 --executor-memory 1g --total-executor-cores 2 =
/root/spark-examples_2.10-1.1.0.jar 1000
>>>=20
>>> It=92s work ok, a part of the log for the shell:
>>> ----------------------------------
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 995.0 =
in stage 0.0 (TID 995) in 17 ms on Hadoop.Slave1 (996/1000)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 998.0 =
in stage 0.0 (TID 998, Hadoop.Slave2, PROCESS_LOCAL, 1225 bytes)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 996.0 =
in stage 0.0 (TID 996) in 20 ms on Hadoop.Slave2 (997/1000)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 999.0 =
in stage 0.0 (TID 999, Hadoop.Slave1, PROCESS_LOCAL, 1225 bytes)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 997.0 =
in stage 0.0 (TID 997) in 27 ms on Hadoop.Slave1 (998/1000)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 998.0 =
in stage 0.0 (TID 998) in 31 ms on Hadoop.Slave2 (999/1000)
>>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 999.0 =
in stage 0.0 (TID 999) in 20 ms on Hadoop.Slave1 (1000/1000)
>>> 14/09/19 19:48:00 INFO scheduler.DAGScheduler: Stage 0 (reduce at =
SparkPi.scala:35) finished in 25.109 s
>>> 14/09/19 19:48:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet =
0.0, whose tasks have all completed, from pool
>>> 14/09/19 19:48:00 INFO spark.SparkContext: Job finished: reduce at =
SparkPi.scala:35, took 26.156022565 s
>>> Pi is roughly 3.14156112
>>> ----------------------------------
>>>=20
>>> Second, test spark-itemsimilarity on "local", it's work ok, shell:
>>>      mahout spark-itemsimilarity -i /test/ss/input/data.txt -o =
/test/ss/output -os -ma local[2] -sem 512m -f1 purchase -f2 view -ic 2 =
-fc 1
>>>=20
>>> Third, test spark-itemsimilarity on "cluster", shell:
>>>      mahout spark-itemsimilarity -i /test/ss/input/data.txt -o =
/test/ss/output -os -ma spark://Hadoop.Master:7077 -sem 512m -f1 =
purchase -f2 view -ic 2 -fc 1
>>>=20
>>> It=92s can=92t work, full logs:
>>> ----------------------------------
>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> SLF4J: Class path contains multiple SLF4J bindings.
>>> SLF4J: Found binding in =
[jar:file:/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNA=
PSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in =
[jar:file:/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAP=
SHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: Found binding in =
[jar:file:/usr/spark-1.1.0-bin-2.4.1/lib/spark-assembly-1.1.0-hadoop2.4.1.=
jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an =
explanation.
>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>> 14/09/19 20:31:07 INFO spark.SecurityManager: Changing view acls to: =
root
>>> 14/09/19 20:31:07 INFO spark.SecurityManager: SecurityManager: =
authentication disabled; ui acls disabled; users with view permissions: =
Set(root)
>>> 14/09/19 20:31:08 INFO slf4j.Slf4jLogger: Slf4jLogger started
>>> 14/09/19 20:31:08 INFO Remoting: Starting remoting
>>> 14/09/19 20:31:08 INFO Remoting: Remoting started; listening on =
addresses :[akka.tcp://spark@Hadoop.Master:47597]
>>> 14/09/19 20:31:08 INFO Remoting: Remoting now listens on addresses: =
[akka.tcp://spark@Hadoop.Master:47597]
>>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering MapOutputTracker
>>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering =
BlockManagerMaster
>>> 14/09/19 20:31:08 INFO storage.DiskBlockManager: Created local =
directory at /tmp/spark-local-20140919203108-e4e3
>>> 14/09/19 20:31:08 INFO storage.MemoryStore: MemoryStore started with =
capacity 2.3 GB.
>>> 14/09/19 20:31:08 INFO network.ConnectionManager: Bound socket to =
port 47186 with id =3D ConnectionManagerId(Hadoop.Master,47186)
>>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Trying to =
register BlockManager
>>> 14/09/19 20:31:08 INFO storage.BlockManagerInfo: Registering block =
manager Hadoop.Master:47186 with 2.3 GB RAM
>>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Registered =
BlockManager
>>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server
>>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started =
SocketConnector@0.0.0.0:41116
>>> 14/09/19 20:31:08 INFO broadcast.HttpBroadcast: Broadcast server =
started at http://192.168.204.128:41116
>>> 14/09/19 20:31:08 INFO spark.HttpFileServer: HTTP File server =
directory is /tmp/spark-10744709-bbeb-4d79-8bfe-d64d77799fb3
>>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server
>>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started =
SocketConnector@0.0.0.0:59137
>>> 14/09/19 20:31:09 INFO server.Server: jetty-8.y.z-SNAPSHOT
>>> 14/09/19 20:31:09 INFO server.AbstractConnector: Started =
SelectChannelConnector@0.0.0.0:4040
>>> 14/09/19 20:31:09 INFO ui.SparkUI: Started SparkUI at =
http://Hadoop.Master:4040
>>> 14/09/19 20:31:10 WARN util.NativeCodeLoader: Unable to load =
native-hadoop library for your platform... using builtin-java classes =
where applicable
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAP=
SHOT.jar at =
http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar =
with timestamp 1411129870562
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar =
at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar =
with timestamp 1411129870588
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at =
http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with =
timestamp 1411129870612
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar =
at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar =
with timestamp 1411129870618
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAP=
SHOT.jar at =
http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar =
with timestamp 1411129870620
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar =
at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar =
with timestamp 1411129870631
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at =
http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with =
timestamp 1411129870644
>>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR =
/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar =
at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar =
with timestamp 1411129870647
>>> 14/09/19 20:31:10 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>> 14/09/19 20:31:13 INFO storage.MemoryStore: ensureFreeSpace(86126) =
called with curMem=3D0, maxMem=3D2491102003
>>> 14/09/19 20:31:13 INFO storage.MemoryStore: Block broadcast_0 stored =
as values to memory (estimated size 84.1 KB, free 2.3 GB)
>>> 14/09/19 20:31:13 INFO mapred.FileInputFormat: Total input paths to =
process : 1
>>> 14/09/19 20:31:13 INFO spark.SparkContext: Starting job: collect at =
TextDelimitedReaderWriter.scala:74
>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Registering RDD 7 =
(distinct at TextDelimitedReaderWriter.scala:74)
>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Got job 0 (collect at =
TextDelimitedReaderWriter.scala:74) with 2 output partitions =
(allowLocal=3Dfalse)
>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Final stage: Stage =
0(collect at TextDelimitedReaderWriter.scala:74)
>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Parents of final =
stage: List(Stage 1)
>>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Missing parents: =
List(Stage 1)
>>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting Stage 1 =
(MapPartitionsRDD[7] at distinct at TextDelimitedReaderWriter.scala:74), =
which has no missing parents
>>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting 2 missing =
tasks from Stage 1 (MapPartitionsRDD[7] at distinct at =
TextDelimitedReaderWriter.scala:74)
>>> 14/09/19 20:31:14 INFO scheduler.TaskSchedulerImpl: Adding task set =
1.0 with 2 tasks
>>> 14/09/19 20:31:29 WARN scheduler.TaskSchedulerImpl: Initial job has =
not accepted any resources; check your cluster UI to ensure that workers =
are registered and have sufficient memory
>>> 14/09/19 20:31:30 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>> 14/09/19 20:31:44 WARN scheduler.TaskSchedulerImpl: Initial job has =
not accepted any resources; check your cluster UI to ensure that workers =
are registered and have sufficient memory
>>> 14/09/19 20:31:50 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>> 14/09/19 20:31:59 WARN scheduler.TaskSchedulerImpl: Initial job has =
not accepted any resources; check your cluster UI to ensure that workers =
are registered and have sufficient memory
>>> 14/09/19 20:32:10 ERROR cluster.SparkDeploySchedulerBackend: =
Application has been killed. Reason: All masters are unresponsive! =
Giving up.
>>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet =
1.0, whose tasks have all completed, from pool
>>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Cancelling stage =
1
>>> 14/09/19 20:32:10 INFO scheduler.DAGScheduler: Failed to run collect =
at TextDelimitedReaderWriter.scala:74
>>> Exception in thread "main" org.apache.spark.SparkException: Job =
aborted due to stage failure: All masters are unresponsive! Giving up.
>>> at =
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSche=
duler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>> at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch=
eduler.scala:1028)
>>> at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch=
eduler.scala:1026)
>>> at =
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala=
:59)
>>> at =
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at =
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026=
)
>>> at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app=
ly(DAGScheduler.scala:634)
>>> at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app=
ly(DAGScheduler.scala:634)
>>> at scala.Option.foreach(Option.scala:236)
>>> at =
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.s=
cala:634)
>>> at =
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$=
2.applyOrElse(DAGScheduler.scala:1229)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> at =
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractD=
ispatcher.scala:386)
>>> at =
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at =
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java=
:1339)
>>> at =
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at =
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja=
va:107)
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/metrics/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/static,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/executors/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/executors,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/environment/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/environment,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/storage/rdd,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/storage/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/storage,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/pool/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/pool,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/stage/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/stage,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages/json,null}
>>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped =
o.e.j.s.ServletContextHandler{/stages,null}
>>> ----------------------------------
>>>=20
>>> Thanks.
>>>=20
>>>=20
>>>=20
>>> On Sep 27, 2014, at 01:05, Pat Ferrel <pat@occamsmachete.com> wrote:
>>>=20
>>>> Any luck with this?
>>>>=20
>>>> If not could you send a full stack trace and check on the cluster =
machines for other logs that might help.
>>>>=20
>>>>=20
>>>> On Sep 25, 2014, at 6:34 AM, Pat Ferrel <pat@occamsmachete.com> =
wrote:
>>>>=20
>>>> Looks like a Spark error as far as I can tell. This error is very =
generic and indicates that the job was not accepted for execution so =
Spark may be configured wrong. This looks like a question for the Spark =
people
>>>>=20
>>>> My Spark sanity check:
>>>>=20
>>>> 1)  In the Spark UI at  http:///Hadoop.Master:8080 does everything =
look correct?
>>>> 2) Have you tested your spark *cluster* with one of their examples? =
Have you run *any non-Mahout* code on the cluster to check that it is =
configured properly?=20
>>>> 3) Are you using exactly the same Spark and Hadoop locally as on =
the cluster?=20
>>>> 4) Did you launch both local and cluster jobs from the same cluster =
machine? The only difference being the master URL (local[2] vs. =
spark://Hadoop.Master:7077)?
>>>>=20
>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has =
not accepted any resources; check your cluster UI to ensure that workers =
are registered and have sufficient memory
>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>>>=20
>>>>=20
>>>> On Sep 24, 2014, at 8:18 PM, pol <swallow_pulm@163.com> wrote:
>>>>=20
>>>> Hi, Pat
>>>> 	Dataset is the same, and the data is very few for test. This is =
a bug?
>>>>=20
>>>>=20
>>>> On Sep 25, 2014, at 02:57, Pat Ferrel <pat.ferrel@gmail.com> wrote:
>>>>=20
>>>>> Are you using different data sets on the local and cluster?
>>>>>=20
>>>>> Try increasing spark memory with -sem, I use -sem 6g for the =
epinions data set.
>>>>>=20
>>>>> The ID dictionaries are kept in-memory on each cluster machine so =
a large number of user or item IDs will need more memory.
>>>>>=20
>>>>>=20
>>>>> On Sep 24, 2014, at 9:31 AM, pol <swallow_pulm@163.com> wrote:
>>>>>=20
>>>>> Hi, All
>>>>> =09
>>>>> 	I=92m sure it=92s ok that launching Spark standalone to a =
cluster, but it can=92t work used for spark-itemsimilarity.
>>>>>=20
>>>>> 	Launching on 'local' it=92s ok:
>>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o =
/user/root/test/output -os -ma local[2] -f1 purchase -f2 view -ic 2 -fc =
1 -sem 1g
>>>>>=20
>>>>> 	but launching on a standalone cluster will be an error:
>>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o =
/user/root/test/output -os -ma spark://Hadoop.Master:7077 -f1 purchase =
-f2 view -ic 2 -fc 1 -sem 1g
>>>>> ------------
>>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job =
has not accepted any resources; check your cluster UI to ensure that =
workers are registered and have sufficient memory
>>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>>>> 14/09/22 04:13:02 WARN scheduler.TaskSchedulerImpl: Initial job =
has not accepted any resources; check your cluster UI to ensure that =
workers are registered and have sufficient memory
>>>>> 14/09/22 04:13:09 INFO client.AppClient$ClientActor: Connecting to =
master spark://Hadoop.Master:7077...
>>>>> 14/09/22 04:13:17 WARN scheduler.TaskSchedulerImpl: Initial job =
has not accepted any resources; check your cluster UI to ensure that =
workers are registered and have sufficient memory
>>>>> 14/09/22 04:13:29 ERROR cluster.SparkDeploySchedulerBackend: =
Application has been killed. Reason: All masters are unresponsive! =
Giving up.
>>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Removed =
TaskSet 1.0, whose tasks have all completed, from pool=20
>>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Cancelling =
stage 1
>>>>> 14/09/22 04:13:29 INFO scheduler.DAGScheduler: Failed to run =
collect at TextDelimitedReaderWriter.scala:74
>>>>> Exception in thread "main" org.apache.spark.SparkException: Job =
aborted due to stage failure: All masters are unresponsive! Giving up.
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSche=
duler$$failJobAndIndependentStages(DAGScheduler.scala:1044)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch=
eduler.scala:1028)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch=
eduler.scala:1026)
>>>>> 	at =
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala=
:59)
>>>>> 	at =
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026=
)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app=
ly(DAGScheduler.scala:634)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app=
ly(DAGScheduler.scala:634)
>>>>> 	at scala.Option.foreach(Option.scala:236)
>>>>> 	at =
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.s=
cala:634)
>>>>> 	at =
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$=
2.applyOrElse(DAGScheduler.scala:1229)
>>>>> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>>>> 	at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>>>> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>>>> 	at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>>>> 	at =
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractD=
ispatcher.scala:386)
>>>>> 	at =
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>>>> 	at =
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java=
:1339)
>>>>> 	at =
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>>>> 	at =
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja=
va:107)
>>>>> ------------
>>>>>=20
>>>>> Thanks.
>>>>>=20
>>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>>=20
>>>=20
>>>=20
>>=20
>>=20
>=20
>=20


--Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C--