hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shubham.gupta" <shubham.gu...@orkash.com>
Subject Map Job of Nutch creates huge amount of logs ( Nutch 2.3.1 + Hadoop 2.7.1 + Yarn)
Date Fri, 09 Sep 2016 12:27:43 GMT
Hey

I am running Nutch processes on Hadoop. The fetcher.parse property is 
set TRUE. While the job is running Map spills are created in the 
directory : /home/hadoop/nodelogs/usercache/root/appcache.


The spills are created during the Map JOB of fetch phase. The file size 
created amounts upto 17 gigs of data and occupies over 90% of datanode 
disk space. The state of the datanode changes to UNHEALTHY after this. 
Therefore, I need to delete the logs created periodically so as the 
process keeps running smoothly but sometimes it hinders with the process 
and tends to increase the job completion time.
I have set logging of only ERROR messages or above in mapred-site.xml. I 
have changed the mapred.userlog.limit.kb to 10240.

Please provide your suggestions such that this can be avoided and lead 
to the proper functioning of NUTCH.

-- 

Shubham Gupta


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@hadoop.apache.org
For additional commands, e-mail: user-help@hadoop.apache.org


Mime
View raw message