hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudhakara st <sudhakara...@gmail.com>
Subject Re: Distributing the code to multiple nodes
Date Wed, 15 Jan 2014 13:53:57 GMT
Hello Ashish

WARN
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
Node : l1-DEV05:1004 does not have sufficient resource for request :
{Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
Location: *, Relax Locality: true} node total capability : <memory:1024,
vCores:8>

Resource manager trying allocate memory 2GB but it available 1GB.


On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:

> I tried that but somehow my map reduce jobs do not execute at all once I
> set it to yarn
>
>
> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in>wrote:
>
>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>> mapred-site.xml
>>
>>
>>
>> In mapred-site.xml you just have to mention:
>>
>> <property>
>>
>> <name>mapreduce.framework.name</name>
>>
>> <value>yarn</value>
>>
>> </property>
>>
>>
>>
>> -Nirmal
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> I think this is the problem. I have not set
>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>> set to local. Now the question is how to set it up to remote. Documentation
>> says I need to specify the host:port of the job tracker for this. As we
>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>> tracker and job tracker. Instead there is now resource manager and node
>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>> Do I set is resourceMangerHost:resourceMangerPort?
>>
>> --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>  Hi Sudhakar,
>>
>> Indeed there was a type the complete command is as follows except the
>> main class since my manifest has the entry for main class.
>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> Next I killed the datanode in 10.12.11.210 and l see the following
>> messages in the log files. Looks like the namenode is still trying to
>> assign the complete task to one single node and since it does not find the
>> complete data set in one node it is complaining.
>>
>>
>> 2014-01-15 16:38:26,894 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,348 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,871 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:27,897 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,349 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1dev-211:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,874 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-dev06:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>> 2014-01-15 16:38:28,900 WARN
>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>> vCores:8>
>>
>>   --Ashish
>>
>>
>>
>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <sudhakara.st@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>   Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>     Hello Sudhakara,
>>
>> Thanks for your suggestion. However once I change the mapreduce framework
>> to yarn my map reduce jobs does not get executed at all. It seems it is
>> waiting on some thread indefinitely. Here is what I have done
>>
>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>> <property>
>>  <name>mapreduce.framework.name</name>
>>  <value>yarn</value>
>> </property>
>>
>> 2) Run the example again using the command
>>
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> The jobs are just stuck and do not move further.
>>
>>   I also tried the following and it complains of filenotfound exception
>> and some security exception
>>
>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>> file:///opt/ApacheHadoop/out/
>>
>> Below is the status of the job from hadoop application console. The
>> progress bar does not move at all.
>>
>>
>>
>> *ID *
>>
>> *User *
>>
>> *Name *
>>
>> *Application Type *
>>
>> *Queue *
>>
>> *StartTime *
>>
>> *FinishTime *
>>
>> *State *
>>
>> *FinalStatus *
>>
>> *Progress *
>>
>> *Tracking UI *
>>
>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>
>> root
>>
>> wordcount
>>
>> MAPREDUCE
>>
>> default
>>
>> Wed, 15 Jan 2014 07:52:04 GMT
>>
>> N/A
>>
>> ACCEPTED
>>
>> UNDEFINED
>>
>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>
>>
>>
>> Please advice what should I do
>>
>> --Ashish
>>
>>
>>
>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <sudhakara.st@gmail.com>
>> wrote:
>>
>>   Hello Ashish
>>
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>
>> like
>>
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>   German,
>>
>> This does not seem to be helping. I tried to use the Fairscheduler as my
>> resource manger but the behavior remains same. I could see the
>> fairscheduler log getting continuous heart beat from both the other nodes.
>> But it is still not distributing the work to other nodes. What I did next
>> was started 3 jobs simultaneously so that may be some part of one of the
>> job be distributed to other nodes. However still only one node is being
>> used :(((. What is that is going wrong can some one help?
>>
>> Sample of fairsheduler log:
>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>
>> My Data distributed as blocks to other nodes. The host with IP
>> 10.12.11.210 has all the data and this is the one which is serving all the
>> request.
>>
>> Total number of blocks: 8
>> 1073741866:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741867:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741868:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741869:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741870:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741871:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741872:         10.12.11.211:50010    View Block Info
>> 10.12.11.210:50010    View Block Info
>> 1073741873:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>>
>>
>> Someone please advice on how to go about this.
>>
>> --Ashish
>>
>>
>>
>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>  Thanks for all these suggestions. Somehow I do not have access to the
>> servers today and will try the suggestions made on monday and will let you
>> know how it goes.
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>> german.fl@samsung.com> wrote:
>>
>>  Ashish
>>
>> Could this be related to the scheduler you are using and its settings?.
>>
>>
>>
>> On lab environments when running a single type of job I often use
>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>> a good job distributing the load.
>>
>>
>>
>> You could give that a try (
>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>> )
>>
>>
>>
>> I think just changing yarn-site.xml  as follows could demonstrate this
>> theory (note that  how the jobs are scheduled depend on resources such as
>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>
>>
>>
>> <property>
>>
>>   <name>yarn.resourcemanager.scheduler.class</name>
>>
>>
>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>
>> </property>
>>
>>
>>
>> Regards
>>
>> ./g
>>
>>
>>
>>
>>
>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>> *Sent:* Thursday, January 09, 2014 6:46 AM
>> *To:* user@hadoop.apache.org
>> *Subject:* Re: Distributing the code to multiple nodes
>>
>>
>>
>> Another point to add here 10.12.11.210 is the host which has everything
>> running including a slave datanode. Data was also distributed this host as
>> well as the jar file. Following are running on 10.12.11.210
>>
>> 7966 DataNode
>> 8480 NodeManager
>> 8353 ResourceManager
>> 8141 SecondaryNameNode
>> 7834 NameNode
>>
>>
>>
>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>> Logs were updated only when I copied the data. After copying the data
>> there has been no updates on the log files.
>>
>>
>>
>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <chris.mawata@gmail.com>
>> wrote:
>>
>> Do the logs on the three nodes contain anything interesting?
>> Chris
>>
>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>
>> Here is the block info for the record I distributed. As can be seen only
>> 10.12.11.210 has all the data and this is the node which is serving all the
>> request. Replicas are available with 209 as well as 210
>>
>> 1073741857:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741858:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741859:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741860:         10.12.11.210:50010    View Block Info
>> 10.12.11.211:50010    View Block Info
>> 1073741861:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741862:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741863:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>> 1073741864:         10.12.11.210:50010    View Block Info
>> 10.12.11.209:50010    View Block Info
>>
>> --Ashish
>>
>>
>>
>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>> Hello Chris,
>>
>> I have now a cluster with 3 nodes and replication factor being 2. When I
>> distribute a file I could see that there are replica of data available in
>> other nodes. However when I run a map reduce job again only one node is
>> serving all the request :(. Can you or anyone please provide some more
>> inputs.
>>
>> Thanks
>> Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com>
>> wrote:
>>
>> 2 nodes and replication factor of 2 results in a replica of each block
>> present on each node. This would allow the possibility that a single node
>> would do the work and yet be data local.  It will probably happen if that
>> single node has the needed capacity.  More nodes than the replication
>> factor are needed to force distribution of the processing.
>> Chris
>>
>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>
>> Guys,
>>
>> I am sure that only one node is being used. I just know ran the job again
>> and could see that CPU usage only for one server going high other server
>> CPU usage remains constant and hence it means other node is not being used.
>> Can someone help me to debug this issue?
>>
>> ++Ashish
>>
>>
>>
>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>> Hello All,
>>
>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>> have a file of size around 1 GB which when copied to HDFS is replicated to
>> both the nodes. Seeing the block info I can see the file has been
>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>> each of size 128 MB.  I use this file as input to run the word count
>> program. Some how I feel only one node is doing all the work and the code
>> is not distributed to other node. How can I make sure code is distributed
>> to both the nodes? Also is there a log or GUI which can be used for this?
>>
>> Please note I am using the latest stable release that is 2.2.0.
>>
>> ++Ashish
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>>   --
>>
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>>
>>
>>
>>
>> ------------------------------
>>
>>
>>
>>
>>
>>
>> NOTE: This message may contain information that is confidential,
>> proprietary, privileged or otherwise protected by law. The message is
>> intended solely for the named addressee. If received in error, please
>> destroy and notify the sender. Any use of this email is prohibited when
>> received in error. Impetus does not represent, warrant and/or guarantee,
>> that the integrity of this communication has been maintained nor that the
>> communication is free of errors, virus, interception or interference.
>>
>
>


-- 

Regards,
...Sudhakara.st

Mime
View raw message