hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thibaut Britz <t.br...@netbreeze.ch>
Subject Reduce Performance (LocalJobRunner vs Hadoop Framework)
Date Mon, 17 Dec 2007 15:25:58 GMT


I'm running a few tests on small test data (the data consists of 150
Megabytes of input data, resulting in 150 000 unique map output records,
resulting in 150 000 reducer output records as well).

When I run this locally (running as java application from eclipse,
LocalJobRunner), the reducer finishes in 0 seconds. 
Running the same code on the same machine within the hadoop framework (in
the google hadoop vmware image), always results in a reduce phase of over 10
seconds (1 reducer, from map 100% till reduce 100%). Running it out of
vmware on an amazon EC2 cluster gives me about the same results (also with
more nodes in the cluster)

Any ideas on what might cause the slowdown? Is this simply the hadoop
framework overhead I have to live with?


(Map input is about 150 Megs, map output records  = 150 000 = reducer output
records as well).

View this message in context: http://www.nabble.com/Reduce-Performance-%28LocalJobRunner-vs-Hadoop-Framework%29-tp14372547p14372547.html
Sent from the Hadoop Users mailing list archive at Nabble.com.

View raw message