Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E230B10526 for ; Fri, 10 Jan 2014 07:29:36 +0000 (UTC) Received: (qmail 60320 invoked by uid 500); 10 Jan 2014 07:28:51 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 60226 invoked by uid 500); 10 Jan 2014 07:28:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 60214 invoked by uid 99); 10 Jan 2014 07:28:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 07:28:41 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ashjain2@gmail.com designates 74.125.82.54 as permitted sender) Received: from [74.125.82.54] (HELO mail-wg0-f54.google.com) (74.125.82.54) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Jan 2014 07:28:36 +0000 Received: by mail-wg0-f54.google.com with SMTP id x13so797789wgg.9 for ; Thu, 09 Jan 2014 23:28:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=+1rpK3CObfMJpZbpeyBQ3izQsNNrcEP0Sy3tG717gz4=; b=NNFbBMcjndrjIdLXYzsIwi0+6qKVvfSFVTOi+UNTTNo7noAMdHLOYOUAu9hPK56DR8 fTUE+fgDmzup6ymnZ0CNaBYkQO8prqbRyOEoehQmqQjsnEQcA2JGDcN5EaOFq/gnIghK Hl7lil8PbfXrmN6dhgg/ESPP16/J839UIANSnnHvpeRNIyfTfNBvdTJk5E3v428LZxsq /iqqHjr+70CCiBstUCN0cHWMKk7qu2yDOdmOeeWMrCa4VNkMNve23Aywqmlwmj6msPxG Hiqs3suAUKFWZR9TPt7h6oGVr6P1cS6/7FgS3rGaWWFF3NVN1XP4X5+wroGMG+v5TmkJ /+1Q== MIME-Version: 1.0 X-Received: by 10.180.14.7 with SMTP id l7mr1117867wic.23.1389338895445; Thu, 09 Jan 2014 23:28:15 -0800 (PST) Received: by 10.194.35.8 with HTTP; Thu, 9 Jan 2014 23:28:15 -0800 (PST) In-Reply-To: <006001cf0d46$575df180$0619d480$@samsung.com> References: <006001cf0d46$575df180$0619d480$@samsung.com> Date: Fri, 10 Jan 2014 12:58:15 +0530 Message-ID: Subject: Re: Distributing the code to multiple nodes From: Ashish Jain To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=f46d04138cdf55f35b04ef98ab46 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04138cdf55f35b04ef98ab46 Content-Type: text/plain; charset=ISO-8859-1 Thanks for all these suggestions. Somehow I do not have access to the servers today and will try the suggestions made on monday and will let you know how it goes. --Ashish On Thu, Jan 9, 2014 at 7:53 PM, German Florez-Larrahondo < german.fl@samsung.com> wrote: > Ashish > > Could this be related to the scheduler you are using and its settings?. > > > > On lab environments when running a single type of job I often use > FairScheduler (the YARN default in 2.2.0 is CapacityScheduler) and it does > a good job distributing the load. > > > > You could give that a try ( > https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html > ) > > > > I think just changing yarn-site.xml as follows could demonstrate this > theory (note that how the jobs are scheduled depend on resources such as > memory on the nodes and you would need to setup yarn-site.xml accordingly). > > > > > > yarn.resourcemanager.scheduler.class > > > org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler > > > > > > Regards > > ./g > > > > > > *From:* Ashish Jain [mailto:ashjain2@gmail.com] > *Sent:* Thursday, January 09, 2014 6:46 AM > *To:* user@hadoop.apache.org > *Subject:* Re: Distributing the code to multiple nodes > > > > Another point to add here 10.12.11.210 is the host which has everything > running including a slave datanode. Data was also distributed this host as > well as the jar file. Following are running on 10.12.11.210 > > 7966 DataNode > 8480 NodeManager > 8353 ResourceManager > 8141 SecondaryNameNode > 7834 NameNode > > > > On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain wrote: > > Logs were updated only when I copied the data. After copying the data > there has been no updates on the log files. > > > > On Thu, Jan 9, 2014 at 5:08 PM, Chris Mawata > wrote: > > Do the logs on the three nodes contain anything interesting? > Chris > > On Jan 9, 2014 3:47 AM, "Ashish Jain" wrote: > > Here is the block info for the record I distributed. As can be seen only > 10.12.11.210 has all the data and this is the node which is serving all the > request. Replicas are available with 209 as well as 210 > > 1073741857: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741858: 10.12.11.210:50010 View Block Info > 10.12.11.211:50010 View Block Info > 1073741859: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741860: 10.12.11.210:50010 View Block Info > 10.12.11.211:50010 View Block Info > 1073741861: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741862: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741863: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > 1073741864: 10.12.11.210:50010 View Block Info > 10.12.11.209:50010 View Block Info > > --Ashish > > > > On Thu, Jan 9, 2014 at 2:11 PM, Ashish Jain wrote: > > Hello Chris, > > I have now a cluster with 3 nodes and replication factor being 2. When I > distribute a file I could see that there are replica of data available in > other nodes. However when I run a map reduce job again only one node is > serving all the request :(. Can you or anyone please provide some more > inputs. > > Thanks > Ashish > > > > On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata > wrote: > > 2 nodes and replication factor of 2 results in a replica of each block > present on each node. This would allow the possibility that a single node > would do the work and yet be data local. It will probably happen if that > single node has the needed capacity. More nodes than the replication > factor are needed to force distribution of the processing. > Chris > > On Jan 8, 2014 7:35 AM, "Ashish Jain" wrote: > > Guys, > > I am sure that only one node is being used. I just know ran the job again > and could see that CPU usage only for one server going high other server > CPU usage remains constant and hence it means other node is not being used. > Can someone help me to debug this issue? > > ++Ashish > > > > On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain wrote: > > Hello All, > > I have a 2 node hadoop cluster running with a replication factor of 2. I > have a file of size around 1 GB which when copied to HDFS is replicated to > both the nodes. Seeing the block info I can see the file has been > subdivided into 8 parts which means it has been subdivided into 8 blocks > each of size 128 MB. I use this file as input to run the word count > program. Some how I feel only one node is doing all the work and the code > is not distributed to other node. How can I make sure code is distributed > to both the nodes? Also is there a log or GUI which can be used for this? > > Please note I am using the latest stable release that is 2.2.0. > > ++Ashish > > > > > > > > > > > --f46d04138cdf55f35b04ef98ab46 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks for all these suggestions. Somehow I do not ha= ve access to the servers today and will try the suggestions made on monday = and will let you know how it goes.

--Ashish


On Thu, Jan 9, 2014 at 7:53 PM, German F= lorez-Larrahondo <german.fl@samsung.com> wrote:

Ashish

Could this be related to= the scheduler you are using and its settings?.

=A0

On lab environments when runnin= g a single type of job I often use FairScheduler (the YARN default in 2.2.0= is CapacityScheduler) and it does a good job distributing the load.=

=A0

You coul= d give that a try (https://hadoop= .apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html)

=A0

I think = just changing yarn-site.xml =A0as follows could demonstrate this theory (no= te that =A0how the jobs are scheduled depend on resources such as memory on= the nodes and you would need to setup yarn-site.xml accordingly). <= u>

=A0

<property>

=A0 <name>yarn.resourcema= nager.scheduler.class</name>

=A0 &l= t;value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.Fai= rScheduler</value>

</p= roperty>

=A0

Regards

./g

=A0

=A0

From: Ashish Jain [mailto:ashjain2@gmail.com]
Sent: Thursday, January 09, 2014 6:46 AM
To: user@hadoop.apache.orgSubject: Re: Distributing the code to multiple nodes=

=A0

Another point to add her= e 10.12.11.210 is the host which has everything running including a slave d= atanode. Data was also distributed this host as well as the jar file. Follo= wing are running on 10.12.11.210

7966 DataNode
8480 NodeManager
8353 ResourceManager
8141 Secon= daryNameNode
7834 NameNode

=A0

On Thu, Jan 9, 2014 at 6:12 PM, Ashish Jain <ashjain2@gmail.com> wrote:

Logs were updated only when I copied the d= ata. After copying the data there has been no updates on the log files.<= /u>

= =A0

On Thu, Jan 9, 2014 at 5:0= 8 PM, Chris Mawata <chris.mawata@gmail.com> wrote:

Do the logs on the three nodes contain anything interesting?
Chris

On Jan 9, 2014 3:47 AM, "Ashish Jain" <ashjain2@gmail.com>= wrote:

Here is the block info for the record I di= stributed. As can be seen only 10.12.11.210 has all the data and this is th= e node which is serving all the request. Replicas are available with 209 as= well as 210

1073741857:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0= =A0=A0=A0=A0 10.12= .11.209:50010=A0=A0=A0 View Block Info
1073741858:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 211:50010=A0=A0=A0 View Block Info
1073741859:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741860:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 211:50010=A0=A0=A0 View Block Info
1073741861:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741862:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 209:50010=A0=A0=A0 View Block Info
1073741863:=A0=A0=A0 =A0=A0=A0=A0= 10.12.11.210:50010= =A0=A0=A0 View Block Info=A0=A0=A0 =A0=A0=A0=A0 10.12.11.209:50010=A0=A0=A0 View Block= Info
1073741864:=A0=A0=A0 =A0=A0=A0=A0 10.12.11.210:50010=A0=A0=A0 View Block Info=A0=A0=A0 =A0= =A0=A0=A0 10.12.11.= 209:50010=A0=A0=A0 View Block Info

--Ashish

=A0

On Thu, Jan 9, 2014 at 2:1= 1 PM, Ashish Jain <ashjain2@gmail.com> wrote:

Hello Chris,

I have now a cluster with 3 nodes and replication factor bei= ng 2. When I distribute a file I could see that there are replica of data a= vailable in other nodes. However when I run a map reduce job again only one= node is serving all the request :(. Can you or anyone please provide some = more inputs.

Thanks
Ashish=

=A0

= On Wed, Jan 8, 2014 at 7:16 PM, Chris Mawata <chris.mawata@gmail.com> wrote:<= /u>

2 nodes and replication factor of 2 results in a replica of each block p= resent on each node. This would allow the possibility that a single node wo= uld do the work and yet be data local.=A0 It will probably happen if that s= ingle node has the needed capacity.=A0 More nodes than the replication fact= or are needed to force distribution of the processing.
Chris

=

On Jan 8, 2014 7:35 AM, "Ashish Jain" <= ashjain2@gmail.com<= /a>> wrote:

Guys,<= u>

I am sure that only one node is being used. I just know ran the job again= and could see that CPU usage only for one server going high other server C= PU usage remains constant and hence it means other node is not being used. = Can someone help me to debug this issue?

++Ashish

=A0

On Wed, Jan 8, 2014 at 5:04 PM, Ashish Jain <ashjain2@gmail.com>= wrote:

H= ello All,

I have a 2 node had= oop cluster running with a replication factor of 2. I have a file of size a= round 1 GB which when copied to HDFS is replicated to both the nodes. Seein= g the block info I can see the file has been subdivided into 8 parts which = means it has been subdivided into 8 blocks each of size 128 MB.=A0 I use th= is file as input to run the word count program. Some how I feel only one no= de is doing all the work and the code is not distributed to other node. How= can I make sure code is distributed to both the nodes? Also is there a log= or GUI which can be used for this?

Please note I a= m using the latest stable release that is 2.2.0.

++Ashish=

=A0

<= /div>

=A0

=A0

=A0

= =A0


--f46d04138cdf55f35b04ef98ab46--