mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kone <vinodk...@gmail.com>
Subject Fwd: Issues when running Hadoop on Mesos
Date Mon, 22 Jul 2013 20:28:32 GMT
---------- Forwarded message ----------
From: 夏俊鸾 <xiajunluan@gmail.com>
Date: Mon, Jul 22, 2013 at 6:48 AM
Subject: Issues when running Hadoop on Mesos
To: vinodkone@gmail.com


Hi Vinod,

 Sorry for send you email directly to ask you mesos questions, and it seems
that mesos mail list mesos-dev-subscribe@incubator.apache.org is not
available right now.
    I have downloaded mesos from trunk branch(I would like to support
hadoop-2.0.0-cdh4.1.2) and build mesos(./configure && make && make install)
and make hadoop-2.0.0-mr1-cdh4.1.2, it will launch jobtracker and wordcount
test application automatically, everything for now seems Ok.
    Now, I configure the core-site.xml/hdfs-site.xml/mapred-site.xml to run
hadoop on mesos cluster and details are as below

*========core-site.xml============*
*<property>*
*
*
*<name>io.native.lib.available</name>*
*
*
*<value>true</value>*
*
*
*</property>*
*
*
*<property>*
*<name>fs.default.name</name>*
*<value>hdfs://10.0.2.19:9000</value>*
*</property>*
*==========mapred-site.xml===========*
 *<property>*
*    <name>mapred.job.tracker</name>*
*    <value>10.0.2.19:54311</value>*
*  </property>*
*  <property>*
*    <name>mapred.jobtracker.taskScheduler</name>*
*    <value>org.apache.hadoop.mapred.MesosScheduler</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.taskScheduler</name>*
*    <value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.master</name>*
*    <value>10.0.2.19:5050</value>*
*  </property>*
*#*
*# Make sure to uncomment the 'mapred.mesos.executor' property,*
*# when running the Hadoop JobTracker on a real Mesos cluster.*
*# NOTE: You need to MANUALLY upload the Mesos executor bundle*
*# to the location that is set as the value of this property.*
*  <property>*
*    <name>mapred.mesos.executor</name>*
*    <value>hdfs://10.0.2.19:9000/hadoop.tar.gz</value>*
*  </property>*
*
*
*# The properties below indicate the amount of resources*
*# that are allocated to a Hadoop slot (i.e., map/reduce task) by Mesos.*
*  <property>*
*    <name>mapred.mesos.slot.cpus</name>*
*    <value>0.2</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.slot.disk</name>*
*    <!-- The value is in MB. -->*
*    <value>1024</value>*
*  </property>*
*  <property>*
*    <name>mapred.mesos.slot.mem</name>*
*    <!-- Note that this is the total memory required for*
*         JVM overhead (256 MB) and the heap (-Xmx) of the task.*
*         The value is in MB. -->*
*    <value>512</value>*
*  </property>*

And then I launch jobtracker(./bin/hadoop jobtracker) and wordcount
application manually, but errors happens as following

*============word count ==================*
*[andrew@sr419 hadoop-2.0.0-mr1-cdh4.1.2]$ ./bin/hadoop jar
hadoop-examples-2.0.0-mr1-cdh4.1.2.jar wordcount /user/andrew/tmp out*
*SLF4J: Class path contains multiple SLF4J bindings.*
*SLF4J: Found binding in
[jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
*
*SLF4J: Found binding in
[jar:file:/home/andrew/incubator-mesos/hadoop/hadoop-2.0.0-mr1-cdh4.1.2/build/ivy/lib/Hadoop/common/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
*
*SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.*
*13/07/22 20:33:43 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.*
*13/07/22 20:33:43 INFO input.FileInputFormat: Total input paths to process
: 1*
*13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable*
*13/07/22 20:33:43 INFO mapred.JobClient: Running job: job_201307222033_0002
*
*13/07/22 20:33:44 INFO mapred.JobClient:  map 0% reduce 0% // word count
seems to be pending*
*
*
*============job tracker(it will be TASK_LOST circularly)===================
*

*13/07/22 20:33:43 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:43 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable*
*13/07/22 20:33:43 INFO mapred.MesosScheduler: Added job
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.JobTracker: Job job_201307222033_0002 added
successfully for user 'andrew' to queue 'default'*
*13/07/22 20:33:43 INFO mapred.JobTracker: Initializing
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Initializing
job_201307222033_0002*
*13/07/22 20:33:43 INFO mapred.AuditLogger: USER=andrew IP=10.0.2.19
OPERATION=SUBMIT_JOB TARGET=job_201307222033_0002 RESULT=SUCCESS*
*13/07/22 20:33:43 INFO mapred.JobInProgress: jobToken generated and stored
with users keys in
/tmp/hadoop-andrew/mapred/system/job_201307222033_0002/jobToken*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Input size for job
job_201307222033_0002 = 4010. Number of splits = 1*
*13/07/22 20:33:43 INFO net.NetworkTopology: Adding a new node:
/default-rack/sr419*
*13/07/22 20:33:43 INFO mapred.JobInProgress:
tip:task_201307222033_0002_m_000000 has split on node:/default-rack/sr419*
*13/07/22 20:33:43 INFO mapred.JobInProgress: job_201307222033_0002
LOCALITY_WAIT_FACTOR=1.0*
*13/07/22 20:33:43 INFO mapred.JobInProgress: Job job_201307222033_0002
initialized successfully with 1 map tasks and 1 reduce tasks.*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Launching task
Task_Tracker_0 on http://sr419:31000*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Status update of
Task_Tracker_0 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:48 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Launching task
Task_Tracker_1 on http://sr419:31000*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Status update of
Task_Tracker_1 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:49 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: JobTracker Status*
*      Pending Map Tasks: 1*
*   Pending Reduce Tasks: 1*
*         Idle Map Slots: 0*
*      Idle Reduce Slots: 0*
*     Inactive Map Slots: 0 (launched but no hearbeat yet)*
*  Inactive Reduce Slots: 0 (launched but no hearbeat yet)*
*       Needed Map Slots: 1*
*    Needed Reduce Slots: 1*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Launching task
Task_Tracker_2 on http://sr419:31000*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Satisfied map and reduce
slots needed.*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Status update of
Task_Tracker_2 to TASK_LOST with message Executor terminated*
*13/07/22 20:33:50 INFO mapred.MesosScheduler: Removing terminated
TaskTracker: http://sr419:31000*
*13/07/22 20:33:51 INFO mapred.MesosScheduler: JobTracker Status*
*
*
*=============mesos-slave.INFO===================*
*Registered with master master@10.0.2.19:5050; given slave ID
201307222033-318898186-5050-19972-0
*
*I0722 20:33:48.378780 20034 slave.cpp:739] Got assigned task
Task_Tracker_0 for framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.379360 20034 slave.cpp:837] Launching task Task_Tracker_0
for framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.380995 20034 paths.hpp:303] Created executor directory
'/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a'
*
*I0722 20:33:48.381255 20034 slave.cpp:948] Queuing task 'Task_Tracker_0'
for executor executor_Task_Tracker_0 of framework
'201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.381343 20026 process_isolator.cpp:99] Launching
executor_Task_Tracker_0 (cd hadoop-* && ./bin/mesos-executor) in
/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a
with resources cpus=1; mem=1280' for framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.381484 20015 slave.cpp:511] Successfully attached file
'/var/run/mesos/slaves/201307222033-318898186-5050-19972-0/frameworks/201307222033-318898186-5050-19972-0000/executors/executor_Task_Tracker_0/runs/114ae051-f03a-4728-af0d-6caeb1d3240a'
*
*I0722 20:33:48.382462 20026 process_isolator.cpp:161] Forked executor at
20434*
*I0722 20:33:48.479176 20035 process_isolator.cpp:461] Telling slave of
terminated executor 'executor_Task_Tracker_0' of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.479310 20015 slave.cpp:2060] Executor
'executor_Task_Tracker_0' of framework
201307222033-318898186-5050-19972-0000 has exited with status 255*
*I0722 20:33:48.480988 20015 slave.cpp:1692] Handling status update
TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 from @
0.0.0.0:0*
*I0722 20:33:48.481205 20025 status_update_manager.cpp:290] Received status
update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 with
checkpoint=false*
*I0722 20:33:48.481266 20025 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_0 of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.481461 20025 status_update_manager.cpp:336] Forwarding
status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for
task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to
master@10.0.2.19:5050*
*I0722 20:33:48.481613 20025 slave.cpp:1809] Sending acknowledgement for
status update TASK_LOST (UUID: 61050093-911f-47ad-a7df-bebffd2a753a) for
task Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000 to @
0.0.0.0:0*
*I0722 20:33:48.485322 20030 status_update_manager.cpp:360] Received status
update acknowledgement 61050093-911f-47ad-a7df-bebffd2a753a for task
Task_Tracker_0 of framework 201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.485424 20030 status_update_manager.cpp:481] Cleaning up
status update stream for task Task_Tracker_0 of framework
201307222033-318898186-5050-19972-0000*
*I0722 20:33:48.479262 20035 process_isolator.cpp:259] Performing killtree
operation on 20434*
*Failed to stop 20434: No such process*
*  Children of 20434: {  }*
*Signaled 20434*
*I0722 20:33:48.505930 20035 process_isolator.cpp:287] Asked to update
resources for an unknown/killed executor 'executor_Task_Tracker_0' of
framework 201307222033-318898186-5050-19972-0000*

*===========log in /tmp for 'executor_Task_Tracker_0' is empty==========*










                                     I have suffered above issues for
several days and cannot resolve it for now. One point that I would like
highlight is that I am not sure how to set the property
"mapred.mesos.executor"(it must be the name hadoop.tar.gz? template puzzled
me), could you help me to analysis above issues. thank you in advance.

regards,
Andrew

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message