hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashish Jain <ashja...@gmail.com>
Subject Re: Distributing the code to multiple nodes
Date Wed, 15 Jan 2014 13:13:31 GMT
I think this is the problem. I have not set "mapreduce.jobtracker.address"
in my mapred-site.xml and by default it is set to local. Now the question
is how to set it up to remote. Documentation says I need to specify the
host:port of the job tracker for this. As we know hadoop 2.2.0 is
completely overhauled and there is no concept of task tracker and job
tracker. Instead there is now resource manager and node manager. So in this
case what do I set as "mapreduce.jobtracker.address". Do I set is
resourceMangerHost:resourceMangerPort?

--Ashish


On Wed, Jan 15, 2014 at 4:20 PM, Ashish Jain <ashjain2@gmail.com> wrote:

> Hi Sudhakar,
>
> Indeed there was a type the complete command is as follows except the main
> class since my manifest has the entry for main class.
> /hadoop jar wordCount.jar  /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> Next I killed the datanode in 10.12.11.210 and l see the following
> messages in the log files. Looks like the namenode is still trying to
> assign the complete task to one single node and since it does not find the
> complete data set in one node it is complaining.
>
> 2014-01-15 16:38:26,894 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,348 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,871 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:27,897 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,349 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1dev-211:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,874 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-dev06:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
> 2014-01-15 16:38:28,900 WARN
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
> Node : l1-DEV05:1004 does not have sufficient resource for request :
> {Priority: 0, Capability: <memory:2048, vCores:1>, # Containers: 1,
> Location: *, Relax Locality: true} node total capability : <memory:1024,
> vCores:8>
>
>
> --Ashish
>
>
> On Wed, Jan 15, 2014 at 3:59 PM, sudhakara st <sudhakara.st@gmail.com>wrote:
>
>> Hello Ashish
>>
>>
>> 2) Run the example again using the command
>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>>
>> Unless if it typo mistake the command should be
>> ./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
>> /opt/ApacheHadoop/out/
>>
>> One more thing try , just stop datanode process in  10.12.11.210 and run
>> the job
>>
>>
>>
>>
>> On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>> Hello Sudhakara,
>>>
>>> Thanks for your suggestion. However once I change the mapreduce
>>> framework to yarn my map reduce jobs does not get executed at all. It seems
>>> it is waiting on some thread indefinitely. Here is what I have done
>>>
>>> 1) Set the mapreduce framework to yarn in mapred-site.xml
>>> <property>
>>>  <name>mapreduce.framework.name</name>
>>>  <value>yarn</value>
>>> </property>
>>> 2) Run the example again using the command
>>> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
>>> /opt/ApacheHadoop/out/
>>>
>>> The jobs are just stuck and do not move further.
>>>
>>>
>>> I also tried the following and it complains of filenotfound exception
>>> and some security exception
>>>
>>> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
>>> file:///opt/ApacheHadoop/out/
>>>
>>> Below is the status of the job from hadoop application console. The
>>> progress bar does not move at all.
>>>
>>> ID
>>> User
>>> Name
>>> Application Type
>>> Queue
>>> StartTime
>>> FinishTime
>>> State
>>> FinalStatus
>>> Progress
>>> Tracking UI
>>> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
>>> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
>>> UNDEFINED
>>>
>>> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
>>> Please advice what should I do
>>>
>>> --Ashish
>>>
>>>
>>> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <sudhakara.st@gmail.com>wrote:
>>>
>>>> Hello Ashish
>>>> It seems job is running in Local job runner(LocalJobRunner) by reading
>>>> the Local file system. Can you try by give the full URI path of the input
>>>> and output path.
>>>>  like
>>>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>>>> file:///home/input/  file:///home/output/
>>>>
>>>>
>>>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <ashjain2@gmail.com>wrote:
>>>>
>>>>> German,
>>>>>
>>>>> This does not seem to be helping. I tried to use the Fairscheduler as
>>>>> my resource manger but the behavior remains same. I could see the
>>>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>>>> But it is still not distributing the work to other nodes. What I did
next
>>>>> was started 3 jobs simultaneously so that may be some part of one of
the
>>>>> job be distributed to other nodes. However still only one node is being
>>>>> used :(((. What is that is going wrong can some one help?
>>>>>
>>>>> Sample of fairsheduler log:
>>>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>>>
>>>>> My Data distributed as blocks to other nodes. The host with IP
>>>>> 10.12.11.210 has all the data and this is the one which is serving all
the
>>>>> request.
>>>>>
>>>>> Total number of blocks: 8
>>>>> 1073741866:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741867:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741868:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741869:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741870:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741871:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741872:         10.12.11.211:50010    View Block Info
>>>>> 10.12.11.210:50010    View Block Info
>>>>> 1073741873:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> Someone please advice on how to go about this.
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <ashjain2@gmail.com>wrote:
>>>>>
>>>>>> Thanks for all these suggestions. Somehow I do not have access to
the
>>>>>> servers today and will try the suggestions made on monday and will
let you
>>>>>> know how it goes.
>>>>>>
>>>>>> --Ashish
>>>>>>
>>>>>>
>>>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>>>> german.fl@samsung.com> wrote:
>>>>>>
>>>>>>> Ashish
>>>>>>>
>>>>>>> Could this be related to the scheduler you are using and its
>>>>>>> settings?.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On lab environments when running a single type of job I often
use
>>>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler)
and it does
>>>>>>> a good job distributing the load.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> You could give that a try (
>>>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>>>> )
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I think just changing yarn-site.xml  as follows could demonstrate
>>>>>>> this theory (note that  how the jobs are scheduled depend on
resources such
>>>>>>> as memory on the nodes and you would need to setup yarn-site.xml
>>>>>>> accordingly).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> <property>
>>>>>>>
>>>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>>>
>>>>>>>
>>>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>>>
>>>>>>> </property>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> ./g
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>>>> *To:* user@hadoop.apache.org
>>>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>>>> everything running including a slave datanode. Data was also
distributed
>>>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>>>
>>>>>>> 7966 DataNode
>>>>>>> 8480 NodeManager
>>>>>>> 8353 ResourceManager
>>>>>>> 8141 SecondaryNameNode
>>>>>>> 7834 NameNode
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Logs were updated only when I copied the data. After copying
the
>>>>>>> data there has been no updates on the log files.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <chris.mawata@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Do the logs on the three nodes contain anything interesting?
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com>
wrote:
>>>>>>>
>>>>>>> Here is the block info for the record I distributed. As can be
seen
>>>>>>> only 10.12.11.210 has all the data and this is the node which
is serving
>>>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>>>
>>>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.211:50010    View Block Info
>>>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>>>> 10.12.11.209:50010    View Block Info
>>>>>>>
>>>>>>> --Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <ashjain2@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello Chris,
>>>>>>>
>>>>>>> I have now a cluster with 3 nodes and replication factor being
2.
>>>>>>> When I distribute a file I could see that there are replica of
data
>>>>>>> available in other nodes. However when I run a map reduce job
again only
>>>>>>> one node is serving all the request :(. Can you or anyone please
provide
>>>>>>> some more inputs.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> 2 nodes and replication factor of 2 results in a replica of each
>>>>>>> block present on each node. This would allow the possibility
that a single
>>>>>>> node would do the work and yet be data local.  It will probably
happen if
>>>>>>> that single node has the needed capacity.  More nodes than the
replication
>>>>>>> factor are needed to force distribution of the processing.
>>>>>>> Chris
>>>>>>>
>>>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <ashjain2@gmail.com>
wrote:
>>>>>>>
>>>>>>> Guys,
>>>>>>>
>>>>>>> I am sure that only one node is being used. I just know ran the
job
>>>>>>> again and could see that CPU usage only for one server going
high other
>>>>>>> server CPU usage remains constant and hence it means other node
is not
>>>>>>> being used. Can someone help me to debug this issue?
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hello All,
>>>>>>>
>>>>>>> I have a 2 node hadoop cluster running with a replication factor
of
>>>>>>> 2. I have a file of size around 1 GB which when copied to HDFS
is
>>>>>>> replicated to both the nodes. Seeing the block info I can see
the file has
>>>>>>> been subdivided into 8 parts which means it has been subdivided
into 8
>>>>>>> blocks each of size 128 MB.  I use this file as input to run
the word count
>>>>>>> program. Some how I feel only one node is doing all the work
and the code
>>>>>>> is not distributed to other node. How can I make sure code is
distributed
>>>>>>> to both the nodes? Also is there a log or GUI which can be used
for this?
>>>>>>>
>>>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>>>
>>>>>>> ++Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Regards,
>>>> ...Sudhakara.st
>>>>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>

Mime
View raw message