Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 78A2917CDA for ; Sat, 11 Oct 2014 01:22:55 +0000 (UTC) Received: (qmail 5882 invoked by uid 500); 11 Oct 2014 01:22:53 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 5820 invoked by uid 500); 11 Oct 2014 01:22:53 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 5808 invoked by uid 99); 11 Oct 2014 01:22:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Oct 2014 01:22:53 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.54] (HELO mail-pa0-f54.google.com) (209.85.220.54) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Oct 2014 01:22:48 +0000 Received: by mail-pa0-f54.google.com with SMTP id ey11so2714430pad.27 for ; Fri, 10 Oct 2014 18:22:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:content-type:mime-version:subject:from :in-reply-to:date:cc:message-id:references:to; bh=hSgJl8NRgCY95m2Dfab6VcyDjTuFhFhItg+e4Wpnd3I=; b=F12eAz7I5cgDBjwUKkVVLQA7r8JZ/frPc899KYZhojCb/LoAx/LRgXS92DhMxSqvio Jkv+z/EwK2zUNh/x5Wg7z3/aVERustDgGwSZdG8qCZiJFRkg+8iVoHXTwCHFuvrZOiNR x+7elV2fSB89KwvHzieJfxRCdk2GQXrPvoRFhag4YL3OnswdBruPjZYwxh8RF+fgMB8D AEEcQOBkEjjQg243mMlRPLKC9nrgXgGIYMTty+pdGGEsRm2at22D0lSpMsfLOBzAkyKy W7ouVkvKudqk3I09vSGoXHfaE3deQm9HpZRpOewFKl9hFhfSlQgJQYlOuxLVwLwF7SgR QDKQ== X-Gm-Message-State: ALoCoQmFE3FwCNyZxFh/fpQJO4SNF85hh8Go8uTvMwDm7c75PYYvxYB8+n/aAt0g4QKd2Ie8AA2j X-Received: by 10.66.159.8 with SMTP id wy8mr8934184pab.17.1412990548155; Fri, 10 Oct 2014 18:22:28 -0700 (PDT) Received: from [192.168.0.5] ([63.142.207.22]) by mx.google.com with ESMTPSA id t11sm4575215pdj.89.2014.10.10.18.22.17 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Fri, 10 Oct 2014 18:22:27 -0700 (PDT) Content-Type: multipart/alternative; boundary="Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C" Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\)) Subject: Re: spark-itemsimilarity can't launch on a Spark cluster? From: Pat Ferrel In-Reply-To: <587D12D4-8DF5-4DA3-9118-28D98997027E@163.com> Date: Fri, 10 Oct 2014 18:22:16 -0700 Cc: "user@mahout.apache.org" Message-Id: References: <42401634-20A2-442F-9F18-B81A4776F5C6@163.com> <1C218FF5-06AE-4BA8-BBA2-111B83158067@gmail.com> <4BDE7C7A-AF3E-415F-BF24-9C3852D510BA@163.com> <93A82EB2-6308-447B-8747-2E108983F6EB@occamsmachete.com> <952233A5-5A82-40E6-83E9-C2FA849B28D5@occamsmachete.com> <9CAE988C-69BF-4542-B61E-1F817A02A262@occamsmachete.com> <587D12D4-8DF5-4DA3-9118-28D98997027E@163.com> To: pol X-Mailer: Apple Mail (2.1878.6) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 Did you stop the 1.6g job or did it fail? I see task failures but no stage failures. On Oct 10, 2014, at 8:49 AM, pol wrote: Hi Pat, Yes, spark-itemsimilarity can be work ok, it had been finished = calculation on 150m dataset. The problem above, 1.6g dataset can=92t be finishing = calculation, I have three machines(16 cores and 16g memory per) for this = test, the environment can't finish the calculation? The dataset had archived one file by hadoop archive tool, such = as only a machine at processing state. To do so because don=92t archive = will be coming some error, about information can refer to the = attachment. If you can, I will provide the test dataset to you.=20 Thank you again. On Oct 10, 2014, at 22:07, Pat Ferrel wrote: > So it is completing some of the spar-itemsimilarity jobs now? That is = better at least. >=20 > Yes. More data means you may need more memory or more nodes in your = cluster. This is how to scale Spark and Hadoop. Spark in particular = needs core memory since it tries to avoid disk read/write. >=20 > Try increasing -sem as fas as you can first then you may need to add = machines to your cluster tp speed it up. Do you need results faster than = 15 hours. >=20 > Remember the way the Solr recommender works allows you to make = recommendations to new users and train less often. The new user data = does no have to be in the training/indicator data. You train partly = based on how many new user but partly based on how many new items are = added to the catalog. >=20 > A\On Oct 10, 2014, at 1:47 AM, pol wrote: >=20 > Hi Pat, > Because of a holiday, now just reply. >=20 > I changed 1.0.2 to 1.0.1 for mahout-1.0-SNAPSHOT, and use Spark = 1.0.1 , Hadoop 2.4.0, spark-itemsimilarity can be work ok. But have a = new question: > mahout spark-itemsimilarity -i /view_input,/purchase_input -o = /output -os -ma spark://recommend1:7077 -sem 15g -f1 purchase -f2 view = -ic 2 -fc 1 -m 36 >=20 > When "view" data:1.6g and "purchase" data:60m, this shell 15 = hours are not performed("indicator-matrix" had computed, and = "cross-indicator-matrix" computing), but "view" data:100m finished 2 = minutes to perform, this is the reason of data? >=20 >=20 > On Oct 1, 2014, at 01:10, Pat Ferrel wrote: >=20 >> This will not be fixed in Mahout 1.0 unless we can find a problem in = Mahout now. I am the one who would fix it. At present it looks to me = like a Spark version or setup problem. >>=20 >> These errors seem to indicate that the build or setup have a = problems. It seems that you cannot use Spark 1.10. Set up your cluster = to use mahout-1.0-SNAPSHOT with pom set to back to spark-1.0.1, Spark = 1.0.1 build for Hadoop 2.4, and Hadoop 2.4. This is the only combination = that is supposed to work together. >>=20 >> If this still fails it may be a setup problems since I can run on a = cluster just fine with my setup. When you get an error from this config = send it to me and the Spark user list to see if they can give us a clue. >>=20 >> Question: Do you have mahout-1.0-SNAPSHOT and spark installed on all = your cluster machines, with the correct environment variables and path? >>=20 >>=20 >> On Sep 30, 2014, at 12:47 AM, pol wrote: >>=20 >> Hi Pat,=20 >> It=92s problem for Spark version, but spark-itemsimilarity is = still can't the completion of normal. >>=20 >> 1. Change 1.0.1 to 1.1.0 at mahout-1.0-SNAPSHOT/pom.xml, Spark = version compatibility is no problem, but the program has a problem: >> -------------------------------------------------------------- >> 14/09/30 11:26:04 WARN scheduler.TaskSetManager: Lost task 1.0 in = stage 10.1 (TID 31, Hadoop.Slave1): java.lang.NoClassDefFoundError: =20 >> org/apache/commons/math3/random/RandomGenerator >> = org.apache.mahout.common.RandomUtils.getRandom(RandomUtils.java:65) >> = org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAn= alysis.scala:228) >> = org.apache.mahout.math.cf.SimilarityAnalysis$$anonfun$3.apply(SimilarityAn= alysis.scala:223) >> = org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.sc= ala:33) >> = org.apache.mahout.sparkbindings.blas.MapBlock$$anonfun$1.apply(MapBlock.sc= ala:32) >> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> = scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) >> = org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235) >> = org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163) >> = org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:227) >> org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) >> = org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> = org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> = org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> = org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) >> org.apache.spark.scheduler.Task.run(Task.scala:54) >> = org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) >> = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) >> = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) >> java.lang.Thread.run(Thread.java:662) >> -------------------------------------------------------------- >> I tried to add commons-math3-3.2.jar to mahout-1.0-SNAPSHOT/lib, but = still the same. (It not directly use the RandomGenerator at = RandomUtils.java:65) >>=20 >>=20 >> 2. Change 1.0.1 to 1.0.2 at mahout-1.0-SNAPSHOT/pom.xml, there are = still other errors: >> -------------------------------------------------------------- >> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Lost TID 427 (task = 7.0:51) >> 14/09/30 14:36:57 WARN scheduler.TaskSetManager: Loss was due to = java.lang.ClassCastException >> java.lang.ClassCastException: scala.Tuple1 cannot be cast to = scala.Tuple2 >> at = org.apache.mahout.drivers.TDIndexedDatasetReader$$anonfun$4.apply(TextDeli= mitedReaderWriter.scala:75) >> at = scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at = scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >> at = org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:59) >> at = org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.sc= ala:96) >> at = org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.sc= ala:95) >> at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) >> at org.apache.spark.rdd.RDD$$anonfun$15.apply(RDD.scala:594) >> at = org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) >> at = org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) >> at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) >> at = org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:158= ) >> at = org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)= >> at org.apache.spark.scheduler.Task.run(Task.scala:51) >> at = org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:183) >> at = java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.= java:886) >> at = java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java= :908) >> at java.lang.Thread.run(Thread.java:662) >> -------------------------------------------------------------- >> Please refer to the attachment for full log. >> >>=20 >>=20 >>=20 >> In addition, I used 66 files on HDFS than each file in 20 to 30 M, = if it is necessary I will provide the data. >> Shell is : mahout spark-itemsimilarity -i = /rec/input/ss/others,/rec/input/ss/weblog -o /rec/output/ss -os -ma = spark://recommend1:7077 -sem 4g -f1 purchase -f2 view -ic 2 -fc 1 >> Spark cluster: 8 workers, 32 cores total, 32G memory total, at two = machines. >>=20 >> Feeling a few days are not solved, not as good as waiting for Mahout = 1.0 release version or use mahout item similarity. >>=20 >>=20 >> Thank you again, Pat. >>=20 >>=20 >> On Sep 29, 2014, at 00:02, Pat Ferrel wrote: >>=20 >>> It looks like the cluster version of spark-itemsimilarity is never = accepted by the Spark master. it fails in = TextDelimitedReaderWriter.scala because all work is using =93lazy=94 = evaluation and until the write no actual work is done on the Spark = cluster. >>>=20 >>> However your cluster seems to be working with the Pi example. = Therefore there must be something wrong with the Mahout build or config. = Some ideas: >>>=20 >>> 1) Mahout 1.0-SNAPSHOT is targeted for Spark 1.0.1. However I use = 1.0.2 and it seems to work. You might try changing the version in the = pom.xml and do a clean build of Mahout. Change the version number in = mahout/pom.xml >>>=20 >>> mahout/pom.xml >>> - 1.0.1 >>> + 1.1.0 >>>=20 >>> This may not be needed but it is easier than installing Spark 1.0.1. >>>=20 >>> 2) Try installing and building Mahout on all cluster machines. I do = this so I can run the Mahout spark-shell on any machine but it may be = needed. The Mahout jars, path setup, and directory structure should be = the same on all cluster machines. >>>=20 >>> 3) Try making -sem larger. I usually make it as large a I can on the = cluster and try smaller until it affects performance. The epinions = dataset that I use for testing on my cluster requires -sem 6g. >>>=20 >>> My cluster has 3 machines with Hadoop 1.2.1 and Spark 1.0.2. I can = try running your data through spark-itemsimilarity on my cluster if you = can share it. I will sign an NDA and destroy it after the test. >>>=20 >>>=20 >>>=20 >>> On Sep 27, 2014, at 5:28 AM, pol wrote: >>>=20 >>> Hi Pat, >>> Thank for your=92s reply. It's still can't work normal, I tested = it on a Spark standalone cluster, don=92t tested it on a YARN cluster. >>>=20 >>> First, test the cluster configuration is correct. = http:///Hadoop.Master:8080 infos: >>> ----------------------------------- >>> URL: spark://Hadoop.Master:7077 >>> Workers: 2 >>> Cores: 4 Total, 0 Used >>> Memory: 2.0 GB Total, 0.0 B Used >>> Applications: 0 Running, 1 Completed >>> Drivers: 0 Running, 0 Completed >>> Status: ALIVE >>> ---------------------------------- >>>=20 >>> Environment >>> ---------------------------------- >>> OS: CentOS release 6.5 (Final) >>> JDK: 1.6.0_45 >>> Mahout: mahout-1.0-SNAPSHOT(mvn -Dhadoop2.version=3D2.4.1 = -DskipTests clean package) >>> Hadoop: 2.4.1 >>> Spark: spark-1.1.0-bin-2.4.1(mvn -Pyarn -Phadoop-2.4 = -Dhadoop.version=3D2.4.1 -Phive -DskipTests clean package) >>> ---------------------------------- >>>=20 >>> Shell: >>> spark-submit --class org.apache.spark.examples.SparkPi --master = spark://Hadoop.Master:7077 --executor-memory 1g --total-executor-cores 2 = /root/spark-examples_2.10-1.1.0.jar 1000 >>>=20 >>> It=92s work ok, a part of the log for the shell: >>> ---------------------------------- >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 995.0 = in stage 0.0 (TID 995) in 17 ms on Hadoop.Slave1 (996/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 998.0 = in stage 0.0 (TID 998, Hadoop.Slave2, PROCESS_LOCAL, 1225 bytes) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 996.0 = in stage 0.0 (TID 996) in 20 ms on Hadoop.Slave2 (997/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Starting task 999.0 = in stage 0.0 (TID 999, Hadoop.Slave1, PROCESS_LOCAL, 1225 bytes) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 997.0 = in stage 0.0 (TID 997) in 27 ms on Hadoop.Slave1 (998/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 998.0 = in stage 0.0 (TID 998) in 31 ms on Hadoop.Slave2 (999/1000) >>> 14/09/19 19:48:00 INFO scheduler.TaskSetManager: Finished task 999.0 = in stage 0.0 (TID 999) in 20 ms on Hadoop.Slave1 (1000/1000) >>> 14/09/19 19:48:00 INFO scheduler.DAGScheduler: Stage 0 (reduce at = SparkPi.scala:35) finished in 25.109 s >>> 14/09/19 19:48:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet = 0.0, whose tasks have all completed, from pool >>> 14/09/19 19:48:00 INFO spark.SparkContext: Job finished: reduce at = SparkPi.scala:35, took 26.156022565 s >>> Pi is roughly 3.14156112 >>> ---------------------------------- >>>=20 >>> Second, test spark-itemsimilarity on "local", it's work ok, shell: >>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o = /test/ss/output -os -ma local[2] -sem 512m -f1 purchase -f2 view -ic 2 = -fc 1 >>>=20 >>> Third, test spark-itemsimilarity on "cluster", shell: >>> mahout spark-itemsimilarity -i /test/ss/input/data.txt -o = /test/ss/output -os -ma spark://Hadoop.Master:7077 -sem 512m -f1 = purchase -f2 view -ic 2 -fc 1 >>>=20 >>> It=92s can=92t work, full logs: >>> ---------------------------------- >>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath. >>> SLF4J: Class path contains multiple SLF4J bindings. >>> SLF4J: Found binding in = [jar:file:/usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNA= PSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in = [jar:file:/usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAP= SHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: Found binding in = [jar:file:/usr/spark-1.1.0-bin-2.4.1/lib/spark-assembly-1.1.0-hadoop2.4.1.= jar!/org/slf4j/impl/StaticLoggerBinder.class] >>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an = explanation. >>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] >>> 14/09/19 20:31:07 INFO spark.SecurityManager: Changing view acls to: = root >>> 14/09/19 20:31:07 INFO spark.SecurityManager: SecurityManager: = authentication disabled; ui acls disabled; users with view permissions: = Set(root) >>> 14/09/19 20:31:08 INFO slf4j.Slf4jLogger: Slf4jLogger started >>> 14/09/19 20:31:08 INFO Remoting: Starting remoting >>> 14/09/19 20:31:08 INFO Remoting: Remoting started; listening on = addresses :[akka.tcp://spark@Hadoop.Master:47597] >>> 14/09/19 20:31:08 INFO Remoting: Remoting now listens on addresses: = [akka.tcp://spark@Hadoop.Master:47597] >>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering MapOutputTracker >>> 14/09/19 20:31:08 INFO spark.SparkEnv: Registering = BlockManagerMaster >>> 14/09/19 20:31:08 INFO storage.DiskBlockManager: Created local = directory at /tmp/spark-local-20140919203108-e4e3 >>> 14/09/19 20:31:08 INFO storage.MemoryStore: MemoryStore started with = capacity 2.3 GB. >>> 14/09/19 20:31:08 INFO network.ConnectionManager: Bound socket to = port 47186 with id =3D ConnectionManagerId(Hadoop.Master,47186) >>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Trying to = register BlockManager >>> 14/09/19 20:31:08 INFO storage.BlockManagerInfo: Registering block = manager Hadoop.Master:47186 with 2.3 GB RAM >>> 14/09/19 20:31:08 INFO storage.BlockManagerMaster: Registered = BlockManager >>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server >>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started = SocketConnector@0.0.0.0:41116 >>> 14/09/19 20:31:08 INFO broadcast.HttpBroadcast: Broadcast server = started at http://192.168.204.128:41116 >>> 14/09/19 20:31:08 INFO spark.HttpFileServer: HTTP File server = directory is /tmp/spark-10744709-bbeb-4d79-8bfe-d64d77799fb3 >>> 14/09/19 20:31:08 INFO spark.HttpServer: Starting HTTP Server >>> 14/09/19 20:31:08 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:08 INFO server.AbstractConnector: Started = SocketConnector@0.0.0.0:59137 >>> 14/09/19 20:31:09 INFO server.Server: jetty-8.y.z-SNAPSHOT >>> 14/09/19 20:31:09 INFO server.AbstractConnector: Started = SelectChannelConnector@0.0.0.0:4040 >>> 14/09/19 20:31:09 INFO ui.SparkUI: Started SparkUI at = http://Hadoop.Master:4040 >>> 14/09/19 20:31:10 WARN util.NativeCodeLoader: Unable to load = native-hadoop library for your platform... using builtin-java classes = where applicable >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAP= SHOT.jar at = http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar = with timestamp 1411129870562 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar = at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar = with timestamp 1411129870588 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at = http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with = timestamp 1411129870612 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar = at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar = with timestamp 1411129870618 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/math-scala/target/mahout-math-scala_2.10-1.0-SNAP= SHOT.jar at = http://192.168.204.128:59137/jars/mahout-math-scala_2.10-1.0-SNAPSHOT.jar = with timestamp 1411129870620 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/mrlegacy/target/mahout-mrlegacy-1.0-SNAPSHOT.jar = at http://192.168.204.128:59137/jars/mahout-mrlegacy-1.0-SNAPSHOT.jar = with timestamp 1411129870631 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/math/target/mahout-math-1.0-SNAPSHOT.jar at = http://192.168.204.128:59137/jars/mahout-math-1.0-SNAPSHOT.jar with = timestamp 1411129870644 >>> 14/09/19 20:31:10 INFO spark.SparkContext: Added JAR = /usr/mahout-1.0-SNAPSHOT/spark/target/mahout-spark_2.10-1.0-SNAPSHOT.jar = at http://192.168.204.128:59137/jars/mahout-spark_2.10-1.0-SNAPSHOT.jar = with timestamp 1411129870647 >>> 14/09/19 20:31:10 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>> 14/09/19 20:31:13 INFO storage.MemoryStore: ensureFreeSpace(86126) = called with curMem=3D0, maxMem=3D2491102003 >>> 14/09/19 20:31:13 INFO storage.MemoryStore: Block broadcast_0 stored = as values to memory (estimated size 84.1 KB, free 2.3 GB) >>> 14/09/19 20:31:13 INFO mapred.FileInputFormat: Total input paths to = process : 1 >>> 14/09/19 20:31:13 INFO spark.SparkContext: Starting job: collect at = TextDelimitedReaderWriter.scala:74 >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Registering RDD 7 = (distinct at TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Got job 0 (collect at = TextDelimitedReaderWriter.scala:74) with 2 output partitions = (allowLocal=3Dfalse) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Final stage: Stage = 0(collect at TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Parents of final = stage: List(Stage 1) >>> 14/09/19 20:31:13 INFO scheduler.DAGScheduler: Missing parents: = List(Stage 1) >>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting Stage 1 = (MapPartitionsRDD[7] at distinct at TextDelimitedReaderWriter.scala:74), = which has no missing parents >>> 14/09/19 20:31:14 INFO scheduler.DAGScheduler: Submitting 2 missing = tasks from Stage 1 (MapPartitionsRDD[7] at distinct at = TextDelimitedReaderWriter.scala:74) >>> 14/09/19 20:31:14 INFO scheduler.TaskSchedulerImpl: Adding task set = 1.0 with 2 tasks >>> 14/09/19 20:31:29 WARN scheduler.TaskSchedulerImpl: Initial job has = not accepted any resources; check your cluster UI to ensure that workers = are registered and have sufficient memory >>> 14/09/19 20:31:30 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>> 14/09/19 20:31:44 WARN scheduler.TaskSchedulerImpl: Initial job has = not accepted any resources; check your cluster UI to ensure that workers = are registered and have sufficient memory >>> 14/09/19 20:31:50 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>> 14/09/19 20:31:59 WARN scheduler.TaskSchedulerImpl: Initial job has = not accepted any resources; check your cluster UI to ensure that workers = are registered and have sufficient memory >>> 14/09/19 20:32:10 ERROR cluster.SparkDeploySchedulerBackend: = Application has been killed. Reason: All masters are unresponsive! = Giving up. >>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Removed TaskSet = 1.0, whose tasks have all completed, from pool >>> 14/09/19 20:32:10 INFO scheduler.TaskSchedulerImpl: Cancelling stage = 1 >>> 14/09/19 20:32:10 INFO scheduler.DAGScheduler: Failed to run collect = at TextDelimitedReaderWriter.scala:74 >>> Exception in thread "main" org.apache.spark.SparkException: Job = aborted due to stage failure: All masters are unresponsive! Giving up. >>> at = org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSche= duler$$failJobAndIndependentStages(DAGScheduler.scala:1044) >>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch= eduler.scala:1028) >>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch= eduler.scala:1026) >>> at = scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala= :59) >>> at = scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at = org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026= ) >>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app= ly(DAGScheduler.scala:634) >>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app= ly(DAGScheduler.scala:634) >>> at scala.Option.foreach(Option.scala:236) >>> at = org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.s= cala:634) >>> at = org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$= 2.applyOrElse(DAGScheduler.scala:1229) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>> at = akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractD= ispatcher.scala:386) >>> at = scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at = scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java= :1339) >>> at = scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>> at = scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja= va:107) >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/metrics/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/stage/kill,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/static,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/executors/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/executors,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/environment/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/environment,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/storage/rdd/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/storage/rdd,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/storage/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/storage,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/pool/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/pool,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/stage/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/stage,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages/json,null} >>> 14/09/19 20:32:10 INFO handler.ContextHandler: stopped = o.e.j.s.ServletContextHandler{/stages,null} >>> ---------------------------------- >>>=20 >>> Thanks. >>>=20 >>>=20 >>>=20 >>> On Sep 27, 2014, at 01:05, Pat Ferrel wrote: >>>=20 >>>> Any luck with this? >>>>=20 >>>> If not could you send a full stack trace and check on the cluster = machines for other logs that might help. >>>>=20 >>>>=20 >>>> On Sep 25, 2014, at 6:34 AM, Pat Ferrel = wrote: >>>>=20 >>>> Looks like a Spark error as far as I can tell. This error is very = generic and indicates that the job was not accepted for execution so = Spark may be configured wrong. This looks like a question for the Spark = people >>>>=20 >>>> My Spark sanity check: >>>>=20 >>>> 1) In the Spark UI at http:///Hadoop.Master:8080 does everything = look correct? >>>> 2) Have you tested your spark *cluster* with one of their examples? = Have you run *any non-Mahout* code on the cluster to check that it is = configured properly?=20 >>>> 3) Are you using exactly the same Spark and Hadoop locally as on = the cluster?=20 >>>> 4) Did you launch both local and cluster jobs from the same cluster = machine? The only difference being the master URL (local[2] vs. = spark://Hadoop.Master:7077)? >>>>=20 >>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job has = not accepted any resources; check your cluster UI to ensure that workers = are registered and have sufficient memory >>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>>>=20 >>>>=20 >>>> On Sep 24, 2014, at 8:18 PM, pol wrote: >>>>=20 >>>> Hi, Pat >>>> Dataset is the same, and the data is very few for test. This is = a bug? >>>>=20 >>>>=20 >>>> On Sep 25, 2014, at 02:57, Pat Ferrel wrote: >>>>=20 >>>>> Are you using different data sets on the local and cluster? >>>>>=20 >>>>> Try increasing spark memory with -sem, I use -sem 6g for the = epinions data set. >>>>>=20 >>>>> The ID dictionaries are kept in-memory on each cluster machine so = a large number of user or item IDs will need more memory. >>>>>=20 >>>>>=20 >>>>> On Sep 24, 2014, at 9:31 AM, pol wrote: >>>>>=20 >>>>> Hi, All >>>>> =09 >>>>> I=92m sure it=92s ok that launching Spark standalone to a = cluster, but it can=92t work used for spark-itemsimilarity. >>>>>=20 >>>>> Launching on 'local' it=92s ok: >>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o = /user/root/test/output -os -ma local[2] -f1 purchase -f2 view -ic 2 -fc = 1 -sem 1g >>>>>=20 >>>>> but launching on a standalone cluster will be an error: >>>>> mahout spark-itemsimilarity -i /user/root/test/input/data.txt -o = /user/root/test/output -os -ma spark://Hadoop.Master:7077 -f1 purchase = -f2 view -ic 2 -fc 1 -sem 1g >>>>> ------------ >>>>> 14/09/22 04:12:47 WARN scheduler.TaskSchedulerImpl: Initial job = has not accepted any resources; check your cluster UI to ensure that = workers are registered and have sufficient memory >>>>> 14/09/22 04:12:49 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>>>> 14/09/22 04:13:02 WARN scheduler.TaskSchedulerImpl: Initial job = has not accepted any resources; check your cluster UI to ensure that = workers are registered and have sufficient memory >>>>> 14/09/22 04:13:09 INFO client.AppClient$ClientActor: Connecting to = master spark://Hadoop.Master:7077... >>>>> 14/09/22 04:13:17 WARN scheduler.TaskSchedulerImpl: Initial job = has not accepted any resources; check your cluster UI to ensure that = workers are registered and have sufficient memory >>>>> 14/09/22 04:13:29 ERROR cluster.SparkDeploySchedulerBackend: = Application has been killed. Reason: All masters are unresponsive! = Giving up. >>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Removed = TaskSet 1.0, whose tasks have all completed, from pool=20 >>>>> 14/09/22 04:13:29 INFO scheduler.TaskSchedulerImpl: Cancelling = stage 1 >>>>> 14/09/22 04:13:29 INFO scheduler.DAGScheduler: Failed to run = collect at TextDelimitedReaderWriter.scala:74 >>>>> Exception in thread "main" org.apache.spark.SparkException: Job = aborted due to stage failure: All masters are unresponsive! Giving up. >>>>> at = org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSche= duler$$failJobAndIndependentStages(DAGScheduler.scala:1044) >>>>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch= eduler.scala:1028) >>>>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSch= eduler.scala:1026) >>>>> at = scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala= :59) >>>>> at = scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>>>> at = org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1026= ) >>>>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app= ly(DAGScheduler.scala:634) >>>>> at = org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.app= ly(DAGScheduler.scala:634) >>>>> at scala.Option.foreach(Option.scala:236) >>>>> at = org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.s= cala:634) >>>>> at = org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$= 2.applyOrElse(DAGScheduler.scala:1229) >>>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>>>> at = akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractD= ispatcher.scala:386) >>>>> at = scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>>>> at = scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java= :1339) >>>>> at = scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) >>>>> at = scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.ja= va:107) >>>>> ------------ >>>>>=20 >>>>> Thanks. >>>>>=20 >>>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>>=20 >>>=20 >>>=20 >>=20 >>=20 >=20 >=20 --Apple-Mail=_9A6AD661-773C-46A8-A87A-97988171095C--