hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Aquilina <jaquil...@eagleeyet.net>
Subject Re: How can I get the memory usage in Namenode and Datanode?
Date Sun, 22 Feb 2015 07:23:19 GMT
 

Hi Tim, 

Not sure if this might be of any use in terms of improving overall
cluster performance for you, but I hope that it might shed some ideas
for you and others. 

https://media.amazonwebservices.com/AWS_Amazon_EMR_Best_Practices.pdf 

---
Regards,
Jonathan Aquilina
Founder Eagle Eye T

On 2015-02-22 07:57, Tim Chou wrote: 

> Hi Jonathan, 
> 
> Very useful information. I will look at the ganglia. 
> 
> However, I do not have the administrative privilege for the cluster. I don't know if
I can install Ganglia in the cluster. 
> 
> Thank you for your information. 
> 
> Best, 
> Tim 
> 
> 2015-02-22 0:53 GMT-06:00 Jonathan Aquilina <jaquilina@eagleeyet.net>:
> 
> Where I am working we are working on transient cluster (temporary) using Amazon EMR.
When I was reading up on how things work they suggested for monitoring to use ganglia to monitor
memory usage and network usage etc. That way depending on how things are setup be it using
an amazon s3 bucket for example and pulling data directly into the cluster the network link
will always be saturated to ensure a constant flow of data. 
> 
> What I am suggesting is potentially looking at ganglia. 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-22 07:42, Fang Zhou wrote: Hi Jonathan, 
> 
> Thank you. 
> 
> The number of files impact on the memory usage in Namenode. 
> 
> I just want to get the real memory usage situation in Namenode. 
> 
> The memory used in heap always changes so that I have no idea about which value is the
right one. 
> 
> Thanks, 
> Tim 
> 
> On Feb 22, 2015, at 12:22 AM, Jonathan Aquilina <jaquilina@eagleeyet.net> wrote:

> 
> I am rather new to hadoop, but wouldnt the difference be potentially in how the files
are split in terms of size? 
> 
> ---
> Regards,
> Jonathan Aquilina
> Founder Eagle Eye T
> 
> On 2015-02-21 21:54, Fang Zhou wrote: 
> 
> Hi All,
> 
> I want to test the memory usage on Namenode and Datanode.
> 
> I try to use jmap, jstat, proc/pid/stat, top, ps aux, and Hadoop website interface to
check the memory.
> The values I get from them are different. I also found that the memory always changes
periodically.
> This is the first thing confused me.
> 
> I thought the more files stored in Namenode, the more memory usage in Namenode and Datanode.
> I also thought the memory used in Namenode should be larger than the memory used in each
Datanode.
> However, some results show my ideas are wrong.
> For example, I test the memory usage of Namenode with 6000 and 1000 files.
> The "6000" memory is less than "1000" memory from jmap's results. 
> I also found that the memory usage in Datanode is larger than the memory used in Namenode.
> 
> I really don't know how to get the memory usage in Namenode and Datanode.
> 
> Can anyone give me some advices?
> 
> Thanks,
> Tim
 
Mime
View raw message