hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. C. Srivas" <mcsri...@gmail.com>
Subject Re: Ideal file size
Date Wed, 06 Jun 2012 16:48:34 GMT
Many factors to consider than just the size of the file.  . How long can
you wait before you *have to* process the data?  5 minutes? 5 hours? 5
days?  If you want good timeliness, you need to roll-over faster.  The
longer you wait:

1.  the lesser the load on the NN.
2.  but the poorer the timeliness
3.  and the larger chance of lost data  (ie, the data is not saved until
the file is closed and rolled over, unless you want to sync() after every

On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> We have continuous flow of data into the sequence file. I am wondering what
> would be the ideal file size before file gets rolled over. I know too many
> small files are not good but could someone tell me what would be the ideal
> size such that it doesn't overload NameNode.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message