hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Lilley <john.lil...@redpoint.net>
Subject HTTP file server, map output, and other files
Date Thu, 23 May 2013 21:44:12 GMT
Thanks to previous kind answers and more reading in the elephant book, I now understand that
mapper tasks place partitioned results into local files that are served up to reducers via
HTTP:

"The output file's partitions are made available to the reducers over HTTP. The maximum number
of worker threads used to serve the file partitions is controlled by the tasktracker.http.threads
property; this setting is per tasktracker, not per map task slot. The default of 40 may need
to be increased for large clusters running large jobs. In MapReduce 2, this property is not
applicable because the maximum number of threads used is set automatically based on the number
of processors on the machine. (MapReduce 2 uses Netty, which by default allows up to twice
as many threads as there are processors.)"

My question is, for a custom (non-MR) application under YARN, how would I set up my application
tasks' output data to be served over HTTP?  Is there an API to control this, or are there
predefined local folders that will be served up?  Once I am finished with the temporary data,
how do I request that the files are removed?

Thanks
John


Mime
View raw message