hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sudhakara st <sudhakara...@gmail.com>
Subject Re: Distributing the code to multiple nodes
Date Wed, 15 Jan 2014 10:29:24 GMT
Hello Ashish

2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/


Unless if it typo mistake the command should be
./hadoop jar wordCount.jar WordCount /opt/ApacheHadoop/temp/worker.log
/opt/ApacheHadoop/out/

One more thing try , just stop datanode process in  10.12.11.210 and run
the job




On Wed, Jan 15, 2014 at 2:07 PM, Ashish Jain <ashjain2@gmail.com> wrote:

> Hello Sudhakara,
>
> Thanks for your suggestion. However once I change the mapreduce framework
> to yarn my map reduce jobs does not get executed at all. It seems it is
> waiting on some thread indefinitely. Here is what I have done
>
> 1) Set the mapreduce framework to yarn in mapred-site.xml
> <property>
>  <name>mapreduce.framework.name</name>
>  <value>yarn</value>
> </property>
> 2) Run the example again using the command
> ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log
> /opt/ApacheHadoop/out/
>
> The jobs are just stuck and do not move further.
>
>
> I also tried the following and it complains of filenotfound exception and
> some security exception
>
> ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log
> file:///opt/ApacheHadoop/out/
>
> Below is the status of the job from hadoop application console. The
> progress bar does not move at all.
>
> ID
> User
> Name
> Application Type
> Queue
> StartTime
> FinishTime
> State
> FinalStatus
> Progress
> Tracking UI
> application_1389771586883_0002<http://10.12.11.210:8088/cluster/app/application_1389771586883_0002>
> rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED
> UNDEFINED
>
> UNASSIGNE <http://10.12.11.210:8088/cluster/apps#>
> Please advice what should I do
>
> --Ashish
>
>
> On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st <sudhakara.st@gmail.com>wrote:
>
>> Hello Ashish
>> It seems job is running in Local job runner(LocalJobRunner) by reading
>> the Local file system. Can you try by give the full URI path of the input
>> and output path.
>>  like
>> $hadoop jar program.jar   ProgramName -Dmapreduce.framework.name=yarn
>> file:///home/input/  file:///home/output/
>>
>>
>> On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <ashjain2@gmail.com> wrote:
>>
>>> German,
>>>
>>> This does not seem to be helping. I tried to use the Fairscheduler as my
>>> resource manger but the behavior remains same. I could see the
>>> fairscheduler log getting continuous heart beat from both the other nodes.
>>> But it is still not distributing the work to other nodes. What I did next
>>> was started 3 jobs simultaneously so that may be some part of one of the
>>> job be distributed to other nodes. However still only one node is being
>>> used :(((. What is that is going wrong can some one help?
>>>
>>> Sample of fairsheduler log:
>>> 2014-01-13 15:13:54,293 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:54,953 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:54,988 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:55,295 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:55,956 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:55,993 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:56,297 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:56,960 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:56,997 HEARTBEAT       l1-DEV05
>>> 2014-01-13 15:13:57,299 HEARTBEAT       l1dev-211
>>> 2014-01-13 15:13:57,964 HEARTBEAT       l1-dev06
>>> 2014-01-13 15:13:58,001 HEARTBEAT       l1-DEV05
>>>
>>> My Data distributed as blocks to other nodes. The host with IP
>>> 10.12.11.210 has all the data and this is the one which is serving all the
>>> request.
>>>
>>> Total number of blocks: 8
>>> 1073741866:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741867:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741868:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741869:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741870:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741871:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>> 1073741872:         10.12.11.211:50010    View Block Info
>>> 10.12.11.210:50010    View Block Info
>>> 1073741873:         10.12.11.210:50010    View Block Info
>>> 10.12.11.209:50010    View Block Info
>>>
>>> Someone please advice on how to go about this.
>>>
>>> --Ashish
>>>
>>>
>>> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <ashjain2@gmail.com>wrote:
>>>
>>>> Thanks for all these suggestions. Somehow I do not have access to the
>>>> servers today and will try the suggestions made on monday and will let you
>>>> know how it goes.
>>>>
>>>> --Ashish
>>>>
>>>>
>>>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo <
>>>> german.fl@samsung.com> wrote:
>>>>
>>>>> Ashish
>>>>>
>>>>> Could this be related to the scheduler you are using and its settings?.
>>>>>
>>>>>
>>>>>
>>>>> On lab environments when running a single type of job I often use
>>>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it
does
>>>>> a good job distributing the load.
>>>>>
>>>>>
>>>>>
>>>>> You could give that a try (
>>>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
>>>>> )
>>>>>
>>>>>
>>>>>
>>>>> I think just changing yarn-site.xml  as follows could demonstrate this
>>>>> theory (note that  how the jobs are scheduled depend on resources such
as
>>>>> memory on the nodes and you would need to setup yarn-site.xml accordingly).
>>>>>
>>>>>
>>>>>
>>>>> <property>
>>>>>
>>>>>   <name>yarn.resourcemanager.scheduler.class</name>
>>>>>
>>>>>
>>>>> <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
>>>>>
>>>>> </property>
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>> ./g
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com]
>>>>> *Sent:* Thursday, January 09, 2014 6:46 AM
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* Re: Distributing the code to multiple nodes
>>>>>
>>>>>
>>>>>
>>>>> Another point to add here 10.12.11.210 is the host which has
>>>>> everything running including a slave datanode. Data was also distributed
>>>>> this host as well as the jar file. Following are running on 10.12.11.210
>>>>>
>>>>> 7966 DataNode
>>>>> 8480 NodeManager
>>>>> 8353 ResourceManager
>>>>> 8141 SecondaryNameNode
>>>>> 7834 NameNode
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Logs were updated only when I copied the data. After copying the data
>>>>> there has been no updates on the log files.
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata <chris.mawata@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Do the logs on the three nodes contain anything interesting?
>>>>> Chris
>>>>>
>>>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>>>>
>>>>> Here is the block info for the record I distributed. As can be seen
>>>>> only 10.12.11.210 has all the data and this is the node which is serving
>>>>> all the request. Replicas are available with 209 as well as 210
>>>>>
>>>>> 1073741857:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741858:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741859:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741860:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.211:50010    View Block Info
>>>>> 1073741861:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741862:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741863:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>> 1073741864:         10.12.11.210:50010    View Block Info
>>>>> 10.12.11.209:50010    View Block Info
>>>>>
>>>>> --Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain <ashjain2@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello Chris,
>>>>>
>>>>> I have now a cluster with 3 nodes and replication factor being 2. When
>>>>> I distribute a file I could see that there are replica of data available
in
>>>>> other nodes. However when I run a map reduce job again only one node
is
>>>>> serving all the request :(. Can you or anyone please provide some more
>>>>> inputs.
>>>>>
>>>>> Thanks
>>>>> Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com>
>>>>> wrote:
>>>>>
>>>>> 2 nodes and replication factor of 2 results in a replica of each block
>>>>> present on each node. This would allow the possibility that a single
node
>>>>> would do the work and yet be data local.  It will probably happen if
that
>>>>> single node has the needed capacity.  More nodes than the replication
>>>>> factor are needed to force distribution of the processing.
>>>>> Chris
>>>>>
>>>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" <ashjain2@gmail.com> wrote:
>>>>>
>>>>> Guys,
>>>>>
>>>>> I am sure that only one node is being used. I just know ran the job
>>>>> again and could see that CPU usage only for one server going high other
>>>>> server CPU usage remains constant and hence it means other node is not
>>>>> being used. Can someone help me to debug this issue?
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hello All,
>>>>>
>>>>> I have a 2 node hadoop cluster running with a replication factor of 2.
>>>>> I have a file of size around 1 GB which when copied to HDFS is replicated
>>>>> to both the nodes. Seeing the block info I can see the file has been
>>>>> subdivided into 8 parts which means it has been subdivided into 8 blocks
>>>>> each of size 128 MB.  I use this file as input to run the word count
>>>>> program. Some how I feel only one node is doing all the work and the
code
>>>>> is not distributed to other node. How can I make sure code is distributed
>>>>> to both the nodes? Also is there a log or GUI which can be used for this?
>>>>>
>>>>> Please note I am using the latest stable release that is 2.2.0.
>>>>>
>>>>> ++Ashish
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> Regards,
>> ...Sudhakara.st
>>
>>
>
>


-- 

Regards,
...Sudhakara.st

Mime
View raw message