hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shantian Purkad <shantian_pur...@yahoo.com>
Subject Linear scalability question
Date Tue, 07 Jun 2011 22:53:26 GMT

I have a question on the linear scalability of Hadoop.

We have a situation where we have to do reduce side joins on two big tables (10+ TB). This
causes lot of data to be transferred over network and network is becoming a bottleneck.

In few years these table will have 100TB + data and the reduce side joins will demand lot
of data transfer over network. Since network bandwidth is limited and can not be addressed
by adding more nodes, hadoop will no longer be linearly scalable in this case.

Is my understanding correct? Am I missing anything here? How do people address these kind
of bottlenecks?

Thanks and Regards,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message