hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nathan Marz <nat...@rapleaf.com>
Subject Re: Turning off FileSystem statistics during MapReduce
Date Mon, 06 Oct 2008 19:13:58 GMT
We see this on Maps and only on incrementBytesRead (not on  
incrementBytesWritten). It is on HDFS where we are seeing the time  
spent. It seems that this is because incrementBytesRead is called  
every time a record is read, while incrementBytesWritten is only  
called when a buffer is spilled. We would benefit a lot from being  
able to turn this off.

On Oct 3, 2008, at 6:19 PM, Arun C Murthy wrote:

> Nathan,
> On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:
>> Hello,
>> We have been doing some profiling of our MapReduce jobs, and we are  
>> seeing about 20% of the time of our jobs is spent calling  
>> "FileSystem$Statistics.incrementBytesRead" when we interact with  
>> the FileSystem. Is there a way to turn this stats-collection off?
> This is interesting... could you provide more details? Are you  
> seeing this on Maps or Reduces? Which FileSystem exhibited this i.e.  
> HDFS or LocalFS? Any details on about your application?
> To answer your original question - no, there isn't a way to disable  
> this. However, if this turns out to be a systemic problem we  
> definitely should consider having an option to allow users to switch  
> it off.
> So any information you can provide helps - thanks!
> Arun
>> Thanks,
>> Nathan Marz
>> Rapleaf

View raw message