hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Lewis <lordjoe2...@gmail.com>
Subject How does performance scale with the size of the data?
Date Thu, 01 Jul 2010 05:15:08 GMT
Assume we have a medium size cluster - say 20 nodes and that the cluster is
used for one job and cannot change in size.
Assume we are sorting a large data set. As we increase the size of the data
sorted say from 100GB to 1000GB to 10000GB does the
time for the sort scale as N or as NLogN? I have heard both answers with
NLogN coming largely from folks less familiar with hadoop and
as N from others with more experience but I am skeptical - has anyone done
tests and can contribute real data

Steven M. Lewis PhD
Institute for Systems Biology
Seattle WA

View raw message