I’m curious about profiling, I see some documentation about it (1.0.3 on AWS), but the references to JobConf seem to be for the “old api” and I’ve got everything running on the “new api”.

 

I’ve got a job to handle processing of about 30GB of compressed CSVs and it’s taking over a day with 3 m1.medium boxes, more than I expected, so I’d like to see where the time is being spent.

 

http://hadoop.apache.org/docs/r1.0.3/mapred_tutorial.html#Profiling

 

I’ve never set up any kind of profiling, so I don’t really know what to expect here.

 

Any pointers to help me set up what’s suggested here? Am I correct in understanding that this doc is a little outdated?