hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Himanshu Vashishtha <vashishth...@gmail.com>
Subject data distribution
Date Sun, 29 Aug 2010 06:02:41 GMT
Can you please provide pointers (your experience plus location in the code)
to look for:
a) For a given job: how the data is provided to a specific task-tracker for
the mr computation. (consider the non-practical scenario where a data node
is NOT a task node and vice-versa). Then the data shd be copied over to the
tt. I think so. Who does that? NN has an in-memory map for chunks to
location; is it like JT ask a TT to go to a specific location after
consulting with NN (based on availability/load). Where in the code?

 As per TW's book, there is some communication between namenode and
job-tracker to decide which replicated chunk shd be dealt where etc. And
then TT copies the data from the DT. What is the strategy for deciding this
assignment. Where in the code?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message