hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Stewart <robstewar...@googlemail.com>
Subject Hadoop Scalability - A Case Study: Concordance
Date Mon, 20 Dec 2010 02:17:29 GMT
Hi All,

I recently entered an Hadoop implementation for the SICSA Muticore
Challenge, held last week:

The aim was to implement the concordance application, in whichever
language, or framework you felt best. We ended up comparing a wide
variety, including Erlang, Parallel Haskell, Java with Fork/Join, and
OpenMP, amongst others.

Whilst most implementations gave very low runtime for very small
inputs, Hadoop was not able (and is not designed) to do so. But where
the Hadoop implementation shone through, was the scaling of input
size. I have written a summary of my implementation, optimizations,
and put a link to the complete set of slides I presented, at this


Perhaps the highlight of these results is (running on 16 nodes):
Benchmark 1
Input File: Bible.txt - 801,541 words
Runtime: 36 seconds

Benchmark 2
Input File: ascii100MB.txt - 18,030,005 words
Runtime: 65 seconds

That is an increase multiplier for input size of 22.5, but an increase
in runtime of just 1.8.

Feedback would be welcome. It was interesting to see that some of the
shared memory implementations were not able to compute the 100mb file
without Out-Of-Memory errors. This was not a problem for Hadoop.

There is a plan to hold another Multicore Challenge, in May 2011. If
anyone wants to make any inquiries, I suggest you get in touch with
the faciliator, Hans-Wolfgang Loidl, who's named at the bottom of this


Rob Stewart

View raw message