hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kai Voigt...@123.org>
Subject Re: distributed cache
Date Sat, 22 Dec 2012 12:44:16 GMT

Am 22.12.2012 um 13:03 schrieb Lin Ma <linlma@gmail.com>:

> I want to confirm when on each task node either mapper or reducer access distributed
cache file, it resides on disk, not resides in memory. Just want to make sure distributed
cache file does not fully loaded into memory which compete memory consumption with mapper/reducer
tasks. Is that correct?

Yes, you are correct. The JobTracker will put files for the distributed cache into HDFS with
a higher replication count (10 by default). Whenever a TaskTracker needs those files for a
task it is launching locally, it will fetch a copy to its local disk. So it won't need to
do this again for future tasks on this node. After a job is done, all local copies and the
HDFS copies of files in the distributed cache are cleaned up.


Kai Voigt

View raw message