Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Message-ID: <DUB130-W1564D3D161674AABB008E089B30@phx.gbl>
Content-Type: multipart/alternative;
	boundary="_3219edaf-3ee2-4ea3-a585-6517c7d91c68_"
From: yves callaert <yves_callaert@hotmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Subject: RE: Monitoring dashboard for Hadoop?
Date: Thu, 4 Jun 2015 06:20:00 +0000
Importance: Normal
In-Reply-To: <040a01d09e43$d95554f0$8bfffed0$@mac.com>
References: <040a01d09e43$d95554f0$8bfffed0$@mac.com>
MIME-Version: 1.0

--_3219edaf-3ee2-4ea3-a585-6517c7d91c68_
Content-Type: text/plain; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

Hi=2C
Depending on the version you are using there are some ways to monitor jobs.
You can use Hue (cloudera technology) which has a job monitoring system=2C =
but you could also use the "Yarn Resource Manager UI" to follow jobs.

Monitoring of nodes can be done through ambari(https://ambari.apache.org/) =
or Cloudera Manager (only available for cloudera distributions).

As far as I know the replication process for HDFS can not be changed to fav=
our nodes.
An even distribution is needed in order to have an evenly spreaded load.
If replication blocks get corrupted this will be visible in the logs but th=
e namenode will auto correct the problem by creating a new version of the b=
lock.
Normally you will have a replication factor of 3=2C but you can change this=
=2C if you want data to be spread across more nodes.

Hope this answers some questions.

With Regards=2C
Yves
From: caesarsamsi@mac.com
To: user@hadoop.apache.org
Subject: Monitoring dashboard for Hadoop?
Date: Wed=2C 3 Jun 2015 17:25:43 -0400

Hello=2C I=92m new to Hadoop and successfully built a fully distributed clu=
ster of 3 nodes (1 master=2C 2 slaves) as a proof of concept. I have some q=
uestions below. Is there a dashboard to monitor the progress of a mapreduce=
 computation? 1.       I=92m looking to ensure the computation gets allocat=
ed and uses the correct number of computation nodes2.       Monitor computa=
tion on the nodes (up/down/in-progress/completed)3.       If possible direc=
t computation to specific group of nodes (depending on the computation prio=
rity). Similarly for HDFS1.       Ensure data file gets replicated to the c=
orrect number of nodes2.       If possible prioritize data replication (i.e=
. replicate data files that are accessed frequently to nodes that have bett=
er hardware=2C so some sort of load balancing distribution) Many Thanks=2C =
Caesar. 		 	   		  =

--_3219edaf-3ee2-4ea3-a585-6517c7d91c68_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px=3B
padding:0px
}
body.hmmessage
{
font-size: 12pt=3B
font-family:Calibri
}
--></style></head>
<body class=3D'hmmessage'><div dir=3D'ltr'>Hi=2C<br>Depending on the versio=
n you are using there are some ways to monitor jobs.<br>You can use Hue (cl=
oudera technology) which has a job monitoring system=2C but you could also =
use the "Yarn Resource Manager UI" to follow jobs.<br><br>Monitoring of nod=
es can be done through ambari(https://ambari.apache.org/) or Cloudera Manag=
er (only available for cloudera distributions).<br><br>As far as I know the=
 replication process for HDFS can not be changed to favour nodes.<br>An eve=
n distribution is needed in order to have an evenly spreaded load.<br>If re=
plication blocks get corrupted this will be visible in the logs but the nam=
enode will auto correct the problem by creating a new version of the block.=
<br>Normally you will have a replication factor of 3=2C but you can change =
this=2C if you want data to be spread across more nodes.<br><br>Hope this a=
nswers some questions.<br><br>With Regards=2C<br>Yves<br><div><hr id=3D"sto=
pSpelling">From: caesarsamsi@mac.com<br>To: user@hadoop.apache.org<br>Subje=
ct: Monitoring dashboard for Hadoop?<br>Date: Wed=2C 3 Jun 2015 17:25:43 -0=
400<br><br><style><!--=0A=
.ExternalClass p.ecxMsoNormal=2C .ExternalClass li.ecxMsoNormal=2C .Externa=
lClass div.ecxMsoNormal {=0A=
font-size:11.0pt=3B=0A=
font-family:"Calibri"=2C"sans-serif"=3B=0A=
}=0A=
=0A=
.ExternalClass a:link=2C .ExternalClass span.ecxMsoHyperlink {=0A=
color:blue=3B=0A=
text-decoration:underline=3B=0A=
}=0A=
=0A=
.ExternalClass span.ecxMsoHyperlinkFollowed {=0A=
color:purple=3B=0A=
text-decoration:underline=3B=0A=
}=0A=
=0A=
.ExternalClass p.ecxMsoListParagraph=2C .ExternalClass li.ecxMsoListParagra=
ph=2C .ExternalClass div.ecxMsoListParagraph {=0A=
font-size:11.0pt=3B=0A=
font-family:"Calibri"=2C"sans-serif"=3B=0A=
}=0A=
=0A=
.ExternalClass span.ecxEmailStyle18 {=0A=
font-family:"Calibri"=2C"sans-serif"=3B=0A=
color:windowtext=3B=0A=
}=0A=
=0A=
.ExternalClass span.ecxEmailStyle19 {=0A=
font-family:"Calibri"=2C"sans-serif"=3B=0A=
color:#1F497D=3B=0A=
}=0A=
=0A=
.ExternalClass .ecxMsoChpDefault {=0A=
font-size:10.0pt=3B=0A=
}=0A=
=0A=
.ExternalClass div.ecxWordSection1 {=0A=
}=0A=
=0A=
.ExternalClass ol {=0A=
}=0A=
=0A=
.ExternalClass ul {=0A=
}=0A=
=0A=
--></style><div class=3D"ecxWordSection1"><p class=3D"ecxMsoNormal"><span s=
tyle=3D"color:#1F497D=3B">H</span>ello=2C</p><p class=3D"ecxMsoNormal">&nbs=
p=3B</p><p class=3D"ecxMsoNormal">I=92m new to Hadoop and successfully buil=
t a fully distributed cluster of 3 nodes (1 master=2C 2 slaves) as a proof =
of concept. I have some questions below.</p><p class=3D"ecxMsoNormal">&nbsp=
=3B</p><p class=3D"ecxMsoNormal">Is there a dashboard to monitor the progre=
ss of a mapreduce computation? </p><p class=3D"ecxMsoListParagraph" style=
=3D"text-indent:-.25in=3B"><span style=3D"">1.<span style=3D"font:7.0pt &qu=
ot=3BTimes New Roman&quot=3B=3B">&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&n=
bsp=3B </span></span>I=92m looking to ensure the computation gets allocated=
 and uses the correct number of computation nodes</p><p class=3D"ecxMsoList=
Paragraph" style=3D"text-indent:-.25in=3B"><span style=3D"">2.<span style=
=3D"font:7.0pt &quot=3BTimes New Roman&quot=3B=3B">&nbsp=3B&nbsp=3B&nbsp=3B=
&nbsp=3B&nbsp=3B&nbsp=3B </span></span>Monitor computation on the nodes (up=
/down/in-progress/completed)</p><p class=3D"ecxMsoListParagraph" style=3D"t=
ext-indent:-.25in=3B"><span style=3D"">3.<span style=3D"font:7.0pt &quot=3B=
Times New Roman&quot=3B=3B">&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=
=3B </span></span>If possible direct computation to specific group of nodes=
 (depending on the computation priority).</p><p class=3D"ecxMsoNormal">&nbs=
p=3B</p><p class=3D"ecxMsoNormal">Similarly for HDFS</p><p class=3D"ecxMsoL=
istParagraph" style=3D"text-indent:-.25in=3B"><span style=3D"">1.<span styl=
e=3D"font:7.0pt &quot=3BTimes New Roman&quot=3B=3B">&nbsp=3B&nbsp=3B&nbsp=
=3B&nbsp=3B&nbsp=3B&nbsp=3B </span></span>Ensure data file gets replicated =
to the correct number of nodes</p><p class=3D"ecxMsoListParagraph" style=3D=
"text-indent:-.25in=3B"><span style=3D"">2.<span style=3D"font:7.0pt &quot=
=3BTimes New Roman&quot=3B=3B">&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbsp=3B&nbs=
p=3B </span></span>If possible prioritize data replication (i.e. replicate =
data files that are accessed frequently to nodes that have better hardware=
=2C so some sort of load balancing distribution)</p><p class=3D"ecxMsoNorma=
l">&nbsp=3B</p><p class=3D"ecxMsoNormal">Many Thanks=2C Caesar.</p></div></=
div> 		 	   		  </div></body>
</html>=

--_3219edaf-3ee2-4ea3-a585-6517c7d91c68_--