spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Nguyen <...@adatao.com>
Subject Re: WARN ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
Date Tue, 14 Jan 2014 17:52:03 GMT
Aureliano, this sort of jar-hell is something we have to deal with, whether
Spark or elsewhere. How would you propose we fix this with Spark? Do you
mean that Spark's own scaffolding caused you to pull in both Protobuf 2.4
and 2.5? Or do you mean the error message should have been more helpful?

Sent while mobile. Pls excuse typos etc.
On Jan 14, 2014 9:27 AM, "Aureliano Buendia" <buendia360@gmail.com> wrote:

>
>
>
> On Tue, Jan 14, 2014 at 5:07 PM, Archit Thakur <archit279thakur@gmail.com>wrote:
>
>> How much memory you are setting for exector JVM.
>> This problem comes when either there is a communication problem between
>> Master/Worker. or you do not have any memory left. Eg, you specified 75G
>> for your executor and your machine has a memory of 70G.
>>
>
> This was not a memory problem. This could be considered a spark bug.
>
> Here is what happened: My app was using protobuf 2.5, while spark has a
> protobuf 2.4 dependency, and classpath was like this:
>
> my_app.jar:spark_assembly.jar:..
>
> This caused spark, (or a dependency, probably hadoop) to use protobuf 2.5,
> giving that misleading 'ensure that workers are registered and have
> sufficient memory' error.
>
> Regenerating this error is easy, just download protobuf 2.5 and put it at
> the beginning of your classpath for any app, you should get that error.
>
>
>>
>>
>> On Thu, Jan 9, 2014 at 11:27 PM, Aureliano Buendia <buendia360@gmail.com>wrote:
>>
>>> The java command worked when I set SPARK_HOME and SPARK_EXAMPLES_JAR
>>> values.
>>>
>>> There are many issues regarding the Initial job has not accepted any
>>> resources... error though:
>>>
>>>    - When I put my assembly jar *before*spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar,
this error happens.
>>>    Moving my jar after the spark-assembly it works fine.
>>>    In my case, I need to put my jar before spark-assembly, as my jar
>>>    uses protobuf 2.5 and spark-assembly comes with protobuf 2.4.
>>>    - Sometimes when this error happens the whole cluster server must be
>>>    restarted, or even run-example script wouldn't work. It took me a while to
>>>    find this out, making debugging very time consuming.
>>>    - The error message is absolutely irrelevant.
>>>
>>> I guess the problem should be somewhere with the spark context jar
>>> delivery part.
>>>
>>>
>>> On Thu, Jan 9, 2014 at 4:17 PM, Aureliano Buendia <buendia360@gmail.com>wrote:
>>>
>>>>
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 5:01 AM, Matei Zaharia <matei.zaharia@gmail.com>wrote:
>>>>
>>>>> Just follow the docs at
>>>>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scalafor
how to run an application. Spark is designed so that you can simply run
>>>>> your application *without* any scripts whatsoever, and submit your JAR
to
>>>>> the SparkContext constructor, which will distribute it. You can launch
your
>>>>> application with “scala”, “java”, or whatever tool you’d prefer.
>>>>>
>>>>
>>>> I'm afraid what you said about 'simply run your application *without*
>>>> any scripts whatsoever' does not apply to spark at the moment, and it
>>>> simply does not work.
>>>>
>>>> Try the simple Pi calculation this on a standard spark-ec2 instance:
>>>>
>>>> java -cp
>>>> /root/spark/examples/target/spark-examples_2.9.3-0.8.1-incubating.jar:/root/spark/assembltarget/scala-2.9.3/spark-assembly_2.9.3-0.8.1-incubating-hadoop1.0.4.jar
>>>> org.apache.spark.examples.SparkPi `cat spark-ec2/cluster-url`
>>>>
>>>> And you'll get the error:
>>>>
>>>> WARN cluster.ClusterScheduler: Initial job has not accepted any
>>>> resources; check your cluster UI to ensure that workers are registered and
>>>> have sufficient memory
>>>>
>>>> While the script way works:
>>>>
>>>> spark/run-example org.apache.spark.examples.SparkPi `cat
>>>> spark-ec2/cluster-url`
>>>>
>>>> What am I missing in the above java command?
>>>>
>>>>
>>>>>
>>>>> Matei
>>>>>
>>>>> On Jan 8, 2014, at 8:26 PM, Aureliano Buendia <buendia360@gmail.com>
>>>>> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 4:11 AM, Matei Zaharia <matei.zaharia@gmail.com
>>>>> > wrote:
>>>>>
>>>>>> Oh, you shouldn’t use spark-class for your own classes. Just build
>>>>>> your job separately and submit it by running it with “java” and
creating a
>>>>>> SparkContext in it. spark-class is designed to run classes internal
to the
>>>>>> Spark project.
>>>>>>
>>>>>
>>>>> Really? Apparently Eugen runs his jobs by:
>>>>>
>>>>>
>>>>> $SPARK_HOME/spark-class SPARK_CLASSPATH=PathToYour.jar com.myproject.MyJob
>>>>>
>>>>> , as he instructed me here<http://mail-archives.apache.org/mod_mbox/spark-user/201401.mbox/browser>to
do this.
>>>>>
>>>>> I have to say while spark documentation is not sparse, it does not
>>>>> address enough, and as you can see the community is confused.
>>>>>
>>>>> Are the spark users supposed to create something like run-example for
>>>>> their own jobs?
>>>>>
>>>>>
>>>>>>
>>>>>> Matei
>>>>>>
>>>>>> On Jan 8, 2014, at 8:06 PM, Aureliano Buendia <buendia360@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 3:59 AM, Matei Zaharia <
>>>>>> matei.zaharia@gmail.com> wrote:
>>>>>>
>>>>>>> Have you looked at the cluster UI, and do you see any workers
>>>>>>> registered there, and your application under running applications?
Maybe
>>>>>>> you typed in the wrong master URL or something like that.
>>>>>>>
>>>>>>
>>>>>> No, it's automated: cat spark-ec2/cluster-url
>>>>>>
>>>>>> I think the problem might be caused by spark-class script. It seems
>>>>>> to assign too much memory.
>>>>>>
>>>>>> I forgot the fact that run-example doesn't use spark-class.
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Matei
>>>>>>>
>>>>>>> On Jan 8, 2014, at 7:07 PM, Aureliano Buendia <buendia360@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> The strange thing is that spark examples work fine, but when
I
>>>>>>> include a spark example in my jar and deploy it, I get this error
for the
>>>>>>> very same example:
>>>>>>>
>>>>>>> WARN ClusterScheduler: Initial job has not accepted any resources;
>>>>>>> check your cluster UI to ensure that workers are registered and
have
>>>>>>> sufficient memory
>>>>>>>
>>>>>>> My jar is deployed to master and then to workers by
>>>>>>> spark-ec2/copy-dir. Why would including the example in my jar
cause this
>>>>>>> error?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 12:41 AM, Aureliano Buendia <
>>>>>>> buendia360@gmail.com> wrote:
>>>>>>>
>>>>>>>> Could someone explain how SPARK_MEM, SPARK_WORKER_MEMORY
and
>>>>>>>> spark.executor.memory should be related so that this non
helpful error
>>>>>>>> doesn't occur?
>>>>>>>>
>>>>>>>> Maybe there are more env and java config variable about memory
that
>>>>>>>> I'm missing.
>>>>>>>>
>>>>>>>> By the way, that bit of the error asking to check the web
UI, it's
>>>>>>>> just redundant. The UI is of no help.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jan 8, 2014 at 4:31 PM, Aureliano Buendia <
>>>>>>>> buendia360@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> My spark cluster is not able to run a job due to this
warning:
>>>>>>>>>
>>>>>>>>> WARN ClusterScheduler: Initial job has not accepted any
resources;
>>>>>>>>> check your cluster UI to ensure that workers are registered
and have
>>>>>>>>> sufficient memory
>>>>>>>>>
>>>>>>>>> The workers have these status:
>>>>>>>>>
>>>>>>>>> ALIVE 2 (0 Used)6.3 GB (0.0 B Used) So there must be
plenty of
>>>>>>>>> memory available despite the warning message. I'm using
default spark
>>>>>>>>> config, is there a config parameter that needs changing
for this to work?
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message