hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Jain <ashja...@gmail.com>
Subject Re: Distributing the code to multiple nodes
Date Wed, 15 Jan 2014 13:52:50 GMT
My execution is stuck at this position indefinitely:

[root@l1-dev06 bin]# ./hadoop jar /opt/ApacheHadoop/wordCount.jar
/opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/OUT56
14/01/15 19:35:12 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 INFO client.RMProxy: Connecting to ResourceManager at /
10.12.11.210:1003
14/01/15 19:35:13 WARN mapreduce.JobSubmitter: Hadoop command-line option
parsing not performed. Implement the Tool interface and execute your
application with ToolRunner to remedy this.
14/01/15 19:35:14 INFO mapred.FileInputFormat: Total input paths to process
: 1
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: number of splits:8
14/01/15 19:35:14 INFO Configuration.deprecation: user.name is deprecated.
Instead, use mapreduce.job.user.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.value.class
is deprecated. Instead, use mapreduce.job.output.value.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.job.name is
deprecated. Instead, use mapreduce.job.name
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.input.dir is
deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.dir is
deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.map.tasks is
deprecated. Instead, use mapreduce.job.maps
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.output.key.class
is deprecated. Instead, use mapreduce.job.output.key.class
14/01/15 19:35:14 INFO Configuration.deprecation: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
14/01/15 19:35:14 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1389794591210_0001
14/01/15 19:35:15 INFO impl.YarnClientImpl: Submitted application
application_1389794591210_0001 to ResourceManager at /10.12.11.210:1003
14/01/15 19:35:15 INFO mapreduce.Job: The url to track the job:
http://l1-dev06:8088/proxy/application_1389794591210_0001/
14/01/15 19:35:15 INFO mapreduce.Job: Running job: job_1389794591210_0001



On Wed, Jan 15, 2014 at 7:20 PM, Ashish Jain <ashjain2@gmail.com> wrote:

> I just now tried it again and I see following messages popping up in the
> log file:
>
> 2014-01-15 19:37:38,221 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 19:37:38,621 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
> Do I need to increase the RAM allocation to slave nodes??
>
>
>
> On Wed, Jan 15, 2014 at 7:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>
>> I tried that but somehow my map reduce jobs do not execute at all once I
>> set it to yarn
>>
>>
>> On Wed, Jan 15, 2014 at 7:00 PM, Nirmal Kumar <nirmal.kumar@impetus.co.in
>> > wrote:
>>
>>>  Surely you don’t have to set **mapreduce.jobtracker.address** in
>>> mapred-site.xml
>>>
>>>
>>>
>>> In mapred-site.xml you just have to mention:
>>>
>>> <property>
>>>
>>> <name>mapreduce.framework.name</name>
>>>
>>> <value>yarn</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> -Nirmal
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Wednesday, January 15, 2014 6:44 PM
>>>
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> I think this is the problem. I have not set
>>> "mapreduce.jobtracker.address" in my mapred-site.xml and by default it is
>>> set to local. Now the question is how to set it up to remote. Documentation
>>> says I need to specify the host:port of the job tracker for this. As we
>>> know hadoop 2.2.0 is completely overhauled and there is no concept of task
>>> tracker and job tracker. Instead there is now resource manager and node
>>> manager. So in this case what do I set as "mapreduce.jobtracker.address".
>>> Do I set is resourceMangerHost:resourceMangerPort?
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>>  Hi Sudhakar,
>>>
>>> Indeed there was a type the complete command is as follows except the
>>> main class since my manifest has the entry for main class.
>>> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> Next I killed the datanode in 10.12.11.210 and l see the following
>>> messages in the log files. Looks like the namenode is still trying to
>>> assign the complete task to one single node and since it does not find the
>>> complete data set in one node it is complaining.
>>>
>>>
>>> 2014-01-15 16:38:26,894 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,348 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,871 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:27,897 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,349 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1dev-211:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,874 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-dev06:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>> 2014-01-15 16:38:28,900 WARN
>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
>>> Node : l1-DEV05:1004 does not have sufficient resource for request :
>>> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
>>> Location: *, Relax Locality: true} node total capability : <memory:1024,
>>> vCores:8>
>>>
>>>   --Ashish
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <sudhakara.st@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>>
>>>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>>   Unless if it typo mistake the command should be
>>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> One more thing try , just stop datanode process in  10.12.11.210 and run
>>> the job
>>>
>>>
>>>
>>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>>     Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>>
>>> 2) Run the example again using the command
>>>
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>   I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>>
>>>
>>> *ID *
>>>
>>> *User *
>>>
>>> *Name *
>>>
>>> *Application Type *
>>>
>>> *Queue *
>>>
>>> *StartTime *
>>>
>>> *FinishTime *
>>>
>>> *State *
>>>
>>> *FinalStatus *
>>>
>>> *Progress *
>>>
>>> *Tracking UI *
>>>
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>>
>>> root
>>>
>>> wordcount
>>>
>>> MAPREDUCE
>>>
>>> default
>>>
>>> Wed, 15 Jan 2014 07:52:04 GMT
>>>
>>> N/A
>>>
>>> ACCEPTED
>>>
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps>
>>>
>>>
>>>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <sudhakara.st@gmail.com>
>>> wrote:
>>>
>>>   Hello Ashish
>>>
>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>> the Local file system. Can you try by give the full URI path of the input
>>> and output path.
>>>
>>> like
>>>
>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>> file:///home/input/  file:///home/output/
>>>
>>>
>>>
>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>>   German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>>
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <ashjain2@gmail.com>
>>> wrote:
>>>
>>>  Thanks for all these suggestions. Somehow I do not have access to the
>>> servers today and will try the suggestions made on monday and will let you
>>> know how it goes.
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>> german.fl@samsung.com> wrote:
>>>
>>>  Ashish
>>>
>>> Could this be related to the scheduler you are using and its settings?.
>>>
>>>
>>>
>>> On lab environments when running a single type of job I often use
>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does
>>> a good job distributing the load.
>>>
>>>
>>>
>>> You could give that a try (
>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>> )
>>>
>>>
>>>
>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>> theory (note that  how the jobs are scheduled depend on resources such as
>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>
>>>
>>>
>>> <property>
>>>
>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>
>>>
>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>
>>> </property>
>>>
>>>
>>>
>>> Regards
>>>
>>> ./g
>>>
>>>
>>>
>>>
>>>
>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>> *To:* user@hadoop.apache.org
>>> *Subject:* Re: Distributing the code to multiple nodes
>>>
>>>
>>>
>>> Another point to add here 10.12.11.210 is the host which has everything
>>> running including a slave datanode. Data was also distributed this host as
>>> well as the jar file. Following are running on 10.12.11.210
>>>
>>> 7966 DataNode
>>> 8480 NodeManager
>>> 8353 ResourceManager
>>> 8141 SecondaryNameNode
>>> 7834 NameNode
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>> Logs were updated only when I copied the data. After copying the data
>>> there has been no updates on the log files.
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <chris.mawata@gmail.com>
>>> wrote:
>>>
>>> Do the logs on the three nodes contain anything interesting?
>>> Chris
>>>
>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>>
>>> Here is the block info for the record I distributed. As can be seen only
>>> 10.12.11.210 has all the data and this is the node which is serving all the
>>> request. Replicas are available with 209 as well as 210
>>>
>>> 1073741857:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741858:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741859:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741860:         10.12.11.210:50010    View Block Info
>>> 10.12.11.211:50010    View Block Info
>>> 1073741861:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741862:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741863:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741864:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> --Ashish
>>>
>>>
>>>
>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>> Hello Chris,
>>>
>>> I have now a cluster with 3 nodes and replication factor being 2. When I
>>> distribute a file I could see that there are replica of data available in
>>> other nodes. However when I run a map reduce job again only one node is
>>> serving all the request :(. Can you or anyone please provide some more
>>> inputs.
>>>
>>> Thanks
>>> Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com>
>>> wrote:
>>>
>>> 2 nodes and replication factor of 2 results in a replica of each block
>>> present on each node. This would allow the possibility that a single node
>>> would do the work and yet be data local.  It will probably happen if that
>>> single node has the needed capacity.  More nodes than the replication
>>> factor are needed to force distribution of the processing.
>>> Chris
>>>
>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>>
>>> Guys,
>>>
>>> I am sure that only one node is being used. I just know ran the job
>>> again and could see that CPU usage only for one server going high other
>>> server CPU usage remains constant and hence it means other node is not
>>> being used. Can someone help me to debug this issue?
>>>
>>> ++Ashish
>>>
>>>
>>>
>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>>
>>> Hello All,
>>>
>>> I have a 2 node hadoop cluster running with a replication factor of 2. I
>>> have a file of size around 1 GB which when copied to HDFS is replicated to
>>> both the nodes. Seeing the block info I can see the file has been
>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>> each of size 128 MB.  I use this file as input to run the word count
>>> program. Some how I feel only one node is doing all the work and the code
>>> is not distributed to other node. How can I make sure code is distributed
>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>
>>> Please note I am using the latest stable release that is 2.2.0.
>>>
>>> ++Ashish
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>>   --
>>>
>>>
>>> Regards,
>>> ...Sudhakara.st
>>>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>>
>>>
>>>
>>>
>>>
>>>
>>> NOTE: This message may contain information that is confidential,
>>> proprietary, privileged or otherwise protected by law. The message is
>>> intended solely for the named addressee. If received in error, please
>>> destroy and notify the sender. Any use of this email is prohibited when
>>> received in error. Impetus does not represent, warrant and/or guarantee,
>>> that the integrity of this communication has been maintained nor that the
>>> communication is free of errors, virus, interception or interference.
>>>
>>
>>
>

Mime
View raw message