hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhangguoping zhangguoping <zhangguopin...@gmail.com>
Subject Distributed Cache
Date Tue, 06 Jul 2010 08:53:45 GMT
>From the book: "Hadoop The definitive guide" -- P242
When you launch a job, Hadoop copies the files specified by the -files and
-archives options to the jobtracker’s filesystem (normally HDFS). Then,
before a task
is run, the tasktracker copies the files from the jobtracker’s filesystem to
a local disk—
the cache—so the task can access the files.

I wonder why hadoop wants to copy the files to jobtracker's filesystem.
Since it is already in HDFS, it should be available to tasks.
Any considerations?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message