Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 66D1D105E4 for ; Wed, 15 Jan 2014 08:38:17 +0000 (UTC) Received: (qmail 58824 invoked by uid 500); 15 Jan 2014 08:38:07 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 58644 invoked by uid 500); 15 Jan 2014 08:38:03 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 58637 invoked by uid 99); 15 Jan 2014 08:38:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 08:38:02 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,T_FILL_THIS_FORM_SHORT,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ashjain2@gmail.com designates 74.125.82.48 as permitted sender) Received: from [74.125.82.48] (HELO mail-wg0-f48.google.com) (74.125.82.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 15 Jan 2014 08:37:58 +0000 Received: by mail-wg0-f48.google.com with SMTP id x13so1446354wgg.27 for ; Wed, 15 Jan 2014 00:37:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mcmJzMlyVsw3glejPbxMT8LaVUC0O+rfIlZdAPfXAME=; b=z7fwjUB8XvK3MdJfPPqUb3sT7pw9PiuvfugLXMH3RYdSsuZTOCbuWBOvPSekneqnAK OMv0RHIIUnL6r4gcpvdzYEApJd26knvB3mzFpc1Wnjsss1Hdh2rH/yI2bBpNmq20aXIa d1F2PP8YNpxkjqO6ER5dIivuY7yTHSU0B17gfZj8E9+PMxrzCBSJwoDhZqCmYedZS2Hh RELpnSYMYOSZfQjDJfLCly3zQpUWQb4vXIjw5T6M6WFK2eBiSUdGN73ADuVMiCprYvQo Wn11diHknoIioV5VOdrxg/i7n3xUJnNY1FGVQRcHHyPYITpPhY7l6tk85WY6ztPI+T5o YOPA== MIME-Version: 1.0 X-Received: by 10.180.211.101 with SMTP id nb5mr3343036wic.0.1389775056767; Wed, 15 Jan 2014 00:37:36 -0800 (PST) Received: by 10.194.35.8 with HTTP; Wed, 15 Jan 2014 00:37:36 -0800 (PST) In-Reply-To: References: <006001cf0d46$575df180$0619d480$@samsung.com> Date: Wed, 15 Jan 2014 14:07:36 +0530 Message-ID: Subject: Re: Distributing the code to multiple nodes From: Ashish Jain To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a11c37cc6939c5e04effe38aa X-Virus-Checked: Checked by ClamAV on apache.org --001a11c37cc6939c5e04effe38aa Content-Type: text/plain; charset=ISO-8859-1 Hello Sudhakara, Thanks for your suggestion. However once I change the mapreduce framework to yarn my map reduce jobs does not get executed at all. It seems it is waiting on some thread indefinitely. Here is what I have done 1) Set the mapreduce framework to yarn in mapred-site.xml mapreduce.framework.name yarn 2) Run the example again using the command ./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/ApacheHadoop/out/ The jobs are just stuck and do not move further. I also tried the following and it complains of filenotfound exception and some security exception ./hadoop dfs wordCount.jar file:///opt/ApacheHadoop/temp/worker.log file:///opt/ApacheHadoop/out/ Below is the status of the job from hadoop application console. The progress bar does not move at all. ID User Name Application Type Queue StartTime FinishTime State FinalStatus Progress Tracking UI application_1389771586883_0002 rootwordcountMAPREDUCEdefaultWed, 15 Jan 2014 07:52:04 GMTN/AACCEPTED UNDEFINED UNASSIGNE Please advice what should I do --Ashish On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st wrote: > Hello Ashish > It seems job is running in Local job runner(LocalJobRunner) by reading the > Local file system. Can you try by give the full URI path of the input and > output path. > like > $hadoop jar program.jar ProgramName -Dmapreduce.framework.name=yarn > file:///home/input/ file:///home/output/ > > > On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain wrote: > >> German, >> >> This does not seem to be helping. I tried to use the Fairscheduler as my >> resource manger but the behavior remains same. I could see the >> fairscheduler log getting continuous heart beat from both the other nodes. >> But it is still not distributing the work to other nodes. What I did next >> was started 3 jobs simultaneously so that may be some part of one of the >> job be distributed to other nodes. However still only one node is being >> used :(((. What is that is going wrong can some one help? >> >> Sample of fairsheduler log: >> 2014-01-13 15:13:54,293 HEARTBEAT l1dev-211 >> 2014-01-13 15:13:54,953 HEARTBEAT l1-dev06 >> 2014-01-13 15:13:54,988 HEARTBEAT l1-DEV05 >> 2014-01-13 15:13:55,295 HEARTBEAT l1dev-211 >> 2014-01-13 15:13:55,956 HEARTBEAT l1-dev06 >> 2014-01-13 15:13:55,993 HEARTBEAT l1-DEV05 >> 2014-01-13 15:13:56,297 HEARTBEAT l1dev-211 >> 2014-01-13 15:13:56,960 HEARTBEAT l1-dev06 >> 2014-01-13 15:13:56,997 HEARTBEAT l1-DEV05 >> 2014-01-13 15:13:57,299 HEARTBEAT l1dev-211 >> 2014-01-13 15:13:57,964 HEARTBEAT l1-dev06 >> 2014-01-13 15:13:58,001 HEARTBEAT l1-DEV05 >> >> My Data distributed as blocks to other nodes. The host with IP >> 10.12.11.210 has all the data and this is the one which is serving all the >> request. >> >> Total number of blocks: 8 >> 1073741866: 10.12.11.211:50010 View Block Info >> 10.12.11.210:50010 View Block Info >> 1073741867: 10.12.11.211:50010 View Block Info >> 10.12.11.210:50010 View Block Info >> 1073741868: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741869: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741870: 10.12.11.211:50010 View Block Info >> 10.12.11.210:50010 View Block Info >> 1073741871: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> 1073741872: 10.12.11.211:50010 View Block Info >> 10.12.11.210:50010 View Block Info >> 1073741873: 10.12.11.210:50010 View Block Info >> 10.12.11.209:50010 View Block Info >> >> Someone please advice on how to go about this. >> >> --Ashish >> >> >> On Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain wrote: >> >>> Thanks for all these suggestions. Somehow I do not have access to the >>> servers today and will try the suggestions made on monday and will let you >>> know how it goes. >>> >>> --Ashish >>> >>> >>> On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo < >>> german.fl@samsung.com> wrote: >>> >>>> Ashish >>>> >>>> Could this be related to the scheduler you are using and its settings?. >>>> >>>> >>>> >>>> On lab environments when running a single type of job I often use >>>> FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does >>>> a good job distributing the load. >>>> >>>> >>>> >>>> You could give that a try ( >>>> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html >>>> ) >>>> >>>> >>>> >>>> I think just changing yarn-site.xml as follows could demonstrate this >>>> theory (note that how the jobs are scheduled depend on resources such as >>>> memory on the nodes and you would need to setup yarn-site.xml accordingly). >>>> >>>> >>>> >>>> >>>> >>>> yarn.resourcemanager.scheduler.class >>>> >>>> >>>> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler >>>> >>>> >>>> >>>> >>>> >>>> Regards >>>> >>>> ./g >>>> >>>> >>>> >>>> >>>> >>>> *From:* Ashish Jain [mailto:ashjain2@gmail.com] >>>> *Sent:* Thursday, January 09, 2014 6:46 AM >>>> *To:* user@hadoop.apache.org >>>> *Subject:* Re: Distributing the code to multiple nodes >>>> >>>> >>>> >>>> Another point to add here 10.12.11.210 is the host which has everything >>>> running including a slave datanode. Data was also distributed this host as >>>> well as the jar file. Following are running on 10.12.11.210 >>>> >>>> 7966 DataNode >>>> 8480 NodeManager >>>> 8353 ResourceManager >>>> 8141 SecondaryNameNode >>>> 7834 NameNode >>>> >>>> >>>> >>>> On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain wrote: >>>> >>>> Logs were updated only when I copied the data. After copying the data >>>> there has been no updates on the log files. >>>> >>>> >>>> >>>> On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata >>>> wrote: >>>> >>>> Do the logs on the three nodes contain anything interesting? >>>> Chris >>>> >>>> On Jan 9, 2014 3:47 AM, "Ashish Jain" wrote: >>>> >>>> Here is the block info for the record I distributed. As can be seen >>>> only 10.12.11.210 has all the data and this is the node which is serving >>>> all the request. Replicas are available with 209 as well as 210 >>>> >>>> 1073741857: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> 1073741858: 10.12.11.210:50010 View Block Info >>>> 10.12.11.211:50010 View Block Info >>>> 1073741859: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> 1073741860: 10.12.11.210:50010 View Block Info >>>> 10.12.11.211:50010 View Block Info >>>> 1073741861: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> 1073741862: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> 1073741863: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> 1073741864: 10.12.11.210:50010 View Block Info >>>> 10.12.11.209:50010 View Block Info >>>> >>>> --Ashish >>>> >>>> >>>> >>>> On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain wrote: >>>> >>>> Hello Chris, >>>> >>>> I have now a cluster with 3 nodes and replication factor being 2. When >>>> I distribute a file I could see that there are replica of data available in >>>> other nodes. However when I run a map reduce job again only one node is >>>> serving all the request :(. Can you or anyone please provide some more >>>> inputs. >>>> >>>> Thanks >>>> Ashish >>>> >>>> >>>> >>>> On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata >>>> wrote: >>>> >>>> 2 nodes and replication factor of 2 results in a replica of each block >>>> present on each node. This would allow the possibility that a single node >>>> would do the work and yet be data local. It will probably happen if that >>>> single node has the needed capacity. More nodes than the replication >>>> factor are needed to force distribution of the processing. >>>> Chris >>>> >>>> On Jan 8, 2014 7:35 AM, "Ashish Jain" wrote: >>>> >>>> Guys, >>>> >>>> I am sure that only one node is being used. I just know ran the job >>>> again and could see that CPU usage only for one server going high other >>>> server CPU usage remains constant and hence it means other node is not >>>> being used. Can someone help me to debug this issue? >>>> >>>> ++Ashish >>>> >>>> >>>> >>>> On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain wrote: >>>> >>>> Hello All, >>>> >>>> I have a 2 node hadoop cluster running with a replication factor of 2. >>>> I have a file of size around 1 GB which when copied to HDFS is replicated >>>> to both the nodes. Seeing the block info I can see the file has been >>>> subdivided into 8 parts which means it has been subdivided into 8 blocks >>>> each of size 128 MB. I use this file as input to run the word count >>>> program. Some how I feel only one node is doing all the work and the code >>>> is not distributed to other node. How can I make sure code is distributed >>>> to both the nodes? Also is there a log or GUI which can be used for this? >>>> >>>> Please note I am using the latest stable release that is 2.2.0. >>>> >>>> ++Ashish >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >> > > > -- > > Regards, > ...Sudhakara.st > > --001a11c37cc6939c5e04effe38aa Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Hello Sudhakara,

Thanks for your suggestion. However once I change the mapreduce framework= to yarn my map reduce jobs does not get executed at all. It seems it is wa= iting on some thread indefinitely. Here is what I have done

1) Set the mapreduce framework to yarn in mapred-site.xml
<= property>
=A0<name>mapreduce.framework.name</name>
=A0<value= >yarn</value>
</property>
2) Run the example again using the command
./hadoop dfs wordCount.jar /opt/ApacheHadoop/temp/worker.log /opt/Apach= eHadoop/out/

The jobs are just stuck and do not move further.<= br>

I also tried the following and it complains of filenotfo= und exception and some security exception

./hadoop dfs wordCount.jar= file:///opt/ApacheHadoop/temp/worker.log file:///opt/ApacheHadoop/out/

Below is the status of the job from hadoop application conso= le. The progress bar does not move at all.

ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI =20
application_13897715868= 83_0002rootwordcount MAPREDUCEdefaultWed, 15 Jan 2014 07= :52:04 GMTN/AACCEPTEDUNDEFINED

UNASSIGNE

Please advice what should I do

--Ashish


On Tue, Jan 14, 2014 at 5:48 PM, sudhakara st = <sudhakara.s= t@gmail.com> wrote:
Hello Ashish=
It seems job is running in Local job runner(LocalJobRunner) by re= ading the Local file system. Can you try by give the full URI path of the i= nput and output path.
like
$hadoop jar program.jar=A0=A0 ProgramName -Dmapreduce.framework.name=3Dyarn file://= /home/input/=A0 file:///home/output/


On Mon, Jan 13, 2014 at 3:02 PM, Ashish Jain <ashjain2@gmail.com>= wrote:
German,

This does not seem to be helping.= I tried to use the Fairscheduler as my resource manger but the behavior re= mains same. I could see the fairscheduler log getting continuous heart beat= from both the other nodes. But it is still not distributing the work to ot= her nodes. What I did next was started 3 jobs simultaneously so that may be= some part of one of the job be distributed to other nodes. However still o= nly one node is being used :(((. What is that is going wrong can some one h= elp?

Sample of fairsheduler log:
2014-01-13 15:13:54,293 HEARTBEAT= =A0=A0=A0=A0=A0=A0 l1dev-211
2014-01-13 15:13:54,953 HEARTBEAT=A0=A0=A0= =A0=A0=A0 l1-dev06
2014-01-13 15:13:54,988 HEARTBEAT=A0=A0=A0=A0=A0=A0 l= 1-DEV05
2014-01-13 15:13:55,295 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1dev-211 2014-01-13 15:13:55,956 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1-dev06
2014-01-13 = 15:13:55,993 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1-DEV05
2014-01-13 15:13:56,29= 7 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1dev-211
2014-01-13 15:13:56,960 HEARTBEA= T=A0=A0=A0=A0=A0=A0 l1-dev06
2014-01-13 15:13:56,997 HEARTBEAT=A0=A0=A0= =A0=A0=A0 l1-DEV05
2014-01-13 15:13:57,299 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1dev-211
2014-01-13= 15:13:57,964 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1-dev06
2014-01-13 15:13:58,0= 01 HEARTBEAT=A0=A0=A0=A0=A0=A0 l1-DEV05

My Data distributed as= blocks to other nodes. The host with IP 10.12.11.210 has all the data and = this is the one which is serving all the request.

Total number of blocks: 8
1073741866:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.211:50010=A0= =A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info 1073741867:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.211:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 210:50010=A0=A0=A0 View Block Info
1073741868:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 209:50010=A0=A0=A0 View Block Info
1073741869:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741870:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.211:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 210:50010=A0=A0=A0 View Block Info
1073741871:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741872:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.211:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 210:50010=A0=A0=A0 View Block Info
1073741873:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info

Someone please advice on how to go about this.

--Ashish


On = Fri, Jan 10, 2014 at 12:58 PM, Ashish Jain <ashjain2@gmail.com> wrote:
Thanks for all these s= uggestions. Somehow I do not have access to the servers today and will try = the suggestions made on monday and will let you know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German F= lorez-Larrahondo <german.fl@samsung.com> wrote:

Ashish

Could this be related to= the scheduler you are using and its settings?.

=A0

On lab environments when runnin= g a single type of job I often use FairScheduler (the YARN default in 2.2.0= is CapacityScheduler) and it does a good job distributing the load.=

=A0

You coul= d give that a try (https://hadoop= .apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

=A0

I think = just changing yarn-site.xml =A0as follows could demonstrate this theory (no= te that =A0how the jobs are scheduled depend on resources such as memory on= the nodes and you would need to setup yarn-site.xml accordingly). <= u>

=A0

<property>

=A0 <name>yarn.resourcema= nager.scheduler.class</name>

=A0 &l= t;value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.Fai= rScheduler</value>

</p= roperty>

=A0

Regards

./g

=A0

=A0

From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.orgSubject: Re: Distributing the code to multiple nodes=

=A0

Another point to add here 10.12.11.21= 0 is the host which has everything running including a slave datanode. Data= was also distributed this host as well as the jar file. Following are runn= ing on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 Secon= daryNameNode
7834 NameNode

=A0

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com> wrote:

Logs were updated only when I copied the d= ata. After copying the data there has been no updates on the log files.<= /u>

= =A0

On Thu, Jan 9, 2014 at 5:0= 8 PM, Chris Mawata <chris.mawata@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com>= wrote:

Here is the block info for the record I di= stributed. As can be seen only 10.12.11.210 has all the data and this is th= e node which is serving all the request. Replicas are available with 209 as= well as 210

1073741857:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0= =A0=A0=A0=A0 10.12= .11.209:50010=A0=A0=A0 View Block Info
1073741858:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 211:50010=A0=A0=A0 View Block Info
1073741859:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741860:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 211:50010=A0=A0=A0 View Block Info
1073741861:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741862:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 209:50010=A0=A0=A0 View Block Info
1073741863:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741864:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 209:50010=A0=A0=A0 View Block Info

--Ashish

=A0

On Thu, Jan 9, 2014 at 2:1= 1 PM, Ashish Jain <ashjain2@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor bei= ng 2. When I distribute a file I could see that there are replica of data a= vailable in other nodes. However when I run a map reduce job again only one= node is serving all the request :(. Can you or anyone please provide some = more inputs.

Thanks
Ashish=

=A0

= On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com> wrote:<= /u>

2 nodes and replication factor of 2 results in a replica of each block p= resent on each node. This would allow the possibility that a single node wo= uld do the work and yet be data local.=A0 It will probably happen if that s= ingle node has the needed capacity.=A0 More nodes than the replication fact= or are needed to force distribution of the processing.
Chris

=

On Jan 8, 2014 7:35 AM, "Ashish Jain" <= ashjain2@gmail.com<= /a>> wrote:

Guys,<= u>

I am sure that only one node is being used. I just know ran the job again= and could see that CPU usage only for one server going high other server C= PU usage remains constant and hence it means other node is not being used. = Can someone help me to debug this issue?

++Ashish

=A0

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com>= wrote:

H= ello All,

I have a 2 node had= oop cluster running with a replication factor of 2. I have a file of size a= round 1 GB which when copied to HDFS is replicated to both the nodes. Seein= g the block info I can see the file has been subdivided into 8 parts which = means it has been subdivided into 8 blocks each of size 128 MB.=A0 I use th= is file as input to run the word count program. Some how I feel only one no= de is doing all the work and the code is not distributed to other node. How= can I make sure code is distributed to both the nodes? Also is there a log= or GUI which can be used for this?

Please note I a= m using the latest stable release that is 2.2.0.

++Ashish=

=A0

<= /div>

=A0

=A0

=A0

= =A0






--
=A0 =A0 = =A0=A0
Regards,
.= ..Sudhakara.st
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=A0

--001a11c37cc6939c5e04effe38aa--