hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zooni Zooni <zoon...@gmail.com>
Subject Data locality in FTPFileSystem and RawLocalFilesystem
Date Fri, 15 Oct 2010 13:40:41 GMT

In the case of RawLocalFilesystem or FTPFileSystem being used as input of a
map-red job,
How does the jobtracker apply the data locality logic .i.e How many map
tasks to start and in which machines?

I want to understand this keeping in mind two scenarios,

Scenario 1: RawLocalFileSystem
   - All the data nodes  have a local directory called /fooLocalBar each
having 10 files (each 200MB size) to be processed.

Scenario 2: FTPFileSystem
  - A common external machine has a directory called /fooRemoteBar which has
10 files (each 200MB) to be processed


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message