hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cagdas Gerede" <cagdas.ger...@gmail.com>
Subject Re: Master Heap Size and Master Startup Time vs. Number of Blocks
Date Fri, 02 May 2008 21:21:26 GMT
> But you could do all this with larger blocks as well.  Having a large
> size only says that a block CAN be that long, not that it MUST be that

No you cannot.

Imagine a streaming server where users send real time generated data to your
server and each file is not more than 100MB. Let's assume user do not have
more than 10 MB of local cache space. So user cannot keep more than 10 MB of
data while he is generating the data. So user caches the data, and streams
it to your server. As one chunk of data accumulates, your server writes that
chunk to Hadoop, gets confirmation from Hadoop and sends an ack to the user
so that user can delete data from his cache (because data is persisted).
This way you are making the system tolerant to the failure of your servers.

How would you do the same thing with a block size of 100MB?
What am I missing?


On Fri, May 2, 2008 at 1:20 PM, Ted Dunning <tdunning@veoh.com> wrote:

> Also, you said that the average size was ~ 40MB (20 x 2MB blocks).  If
> that
> is so, then you should be able to radically decrease the number of blocks
> with a larger block size.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message