mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <ma...@eecs.berkeley.edu>
Subject Re: One more error question
Date Sat, 28 Jan 2012 08:14:31 GMT
It's very likely to be because the AMI has an older version of Mesos. We should make a new
AMI.

The -d git option in the script seems to be broken too, so we should fix that. In theory it
would work… I think it broke when we switched the location of the repo (and maybe the internal
structure too).

Matei

On Jan 27, 2012, at 9:36 PM, Matthew Rathbone wrote:

> When I spin up mesos using the ec2 scripts, and redeploy both hdfs and hadoop using cloudera's
distribution I see this error when I try to start the jobtracker: 
> 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the includes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Setting the excludes file to 
> 12/01/28 05:23:28 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list
> 12/01/28 05:23:28 INFO mapred.JobTracker: Decommissioning 0 nodes
> 12/01/28 05:23:28 INFO mapred.FrameworkScheduler: Got resource offer value: "201201280508-0-5"
> 
> Exception in thread "Thread-20" java.lang.NoSuchMethodError: org.apache.mesos.Protos$Resource.getScalar()Lorg/apache/mesos/Protos$Value$Scalar;
> at org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:176)
> at org.apache.hadoop.mapred.FrameworkScheduler.getResource(FrameworkScheduler.java:183)
> at org.apache.hadoop.mapred.FrameworkScheduler.resourceOffers(FrameworkScheduler.java:203)
> 
> 
> It seems to be stopping the job tracker from starting new tasks.
> 
> I was wondering if this is a version conflict between the mesos I've built against (trunk),
and the version of mesos used on the AMI? -- it seems to come from the generated protobuf
library.
> 
> 
> 
> To try and solve this, I attempted to spin up a cluster passing -d git (to have the latest
code pulled from git, but then I get a string of crazy python exceptions:
> 
> sync error: unexplained error (code 255) at /SourceCache/rsync/rsync-40/rsync/io.c(452)
[sender=2.6.9]
> Traceback (most recent call last):
>  File "./mesos_ec2.py", line 541, in <module>
>    main()
>  File "./mesos_ec2.py", line 450, in main
>    setup_cluster(conn, master_nodes, slave_nodes, zoo_nodes, opts, True)
>  File "./mesos_ec2.py", line 304, in setup_cluster
>    deploy_files(conn, "deploy." + opts.os, opts, master_nodes, slave_nodes, zoo_nodes)
>  File "./mesos_ec2.py", line 415, in deploy_files
>    subprocess.check_call(command, shell=True)
>  File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/subprocess.py",
line 462, in check_call
>    raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command 'rsync -rv -e 'ssh -o StrictHostKeyChecking=no
-i /Users/matthew/id-foursquare' '/var/folders/CK/CKzwG+5sFuSjDMUTvdmWfk+++TI/-Tmp-/tmpFmfdmB/'
'root@ec2.amazonaws.com:/'' returned non-zero exit status 255
> 
> 
> 
> 
> Are version conflicts the likely reason for this failure do you think?
> 
> -- 
> Matthew Rathbone
> Foursquare | Software Engineer | Server Engineering Team
> matthew@foursquare.com (mailto:matthew@foursquare.com) | @rathboma (http://twitter.com/rathboma)
| 4sq (http://foursquare.com/rathboma)
> 
> 


Mime
View raw message