Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates
 216.139.236.158 as permitted sender)
Message-ID: <26866842.post@talk.nabble.com>
Date: Sun, 20 Dec 2009 12:09:15 -0800 (PST)
From: doopha shaf <doopha.shaf@gmail.com>
To: core-user@hadoop.apache.org
Subject: general question - how hadoop works
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit


Trying to figure out how hadoop actually achieves its speed. Assuming that
data locality is central to the efficiency of hadoop, how does the magic
actually happen, given that data still gets moved all over the network to
reach the reducers? 

For example, if I have 1gb of logs spread across 10 data nodes, and for the
sake of argument, assume I use the identity mapper. Then 90% of data still
needs to move across the network - how does the network not become saturated
this way?

What did I miss?...
Thanks,
D.S.
-- 
View this message in context: http://old.nabble.com/general-question---how-hadoop-works-tp26866842p26866842.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.