hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lin Ma <lin...@gmail.com>
Subject Re: distributed cache
Date Sat, 22 Dec 2012 12:46:57 GMT
Thanks Kai, using higher replication count for the purpose of?

regards,
Lin

On Sat, Dec 22, 2012 at 8:44 PM, Kai Voigt <k@123.org> wrote:

> Hi,
>
> Am 22.12.2012 um 13:03 schrieb Lin Ma <linlma@gmail.com>:
>
> > I want to confirm when on each task node either mapper or reducer access
> distributed cache file, it resides on disk, not resides in memory. Just
> want to make sure distributed cache file does not fully loaded into memory
> which compete memory consumption with mapper/reducer tasks. Is that correct?
>
>
> Yes, you are correct. The JobTracker will put files for the distributed
> cache into HDFS with a higher replication count (10 by default). Whenever a
> TaskTracker needs those files for a task it is launching locally, it will
> fetch a copy to its local disk. So it won't need to do this again for
> future tasks on this node. After a job is done, all local copies and the
> HDFS copies of files in the distributed cache are cleaned up.
>
> Kai
>
> --
> Kai Voigt
> k@123.org
>
>
>
>
>

Mime
View raw message