apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Munagala Ramanath <...@datatorrent.com>
Subject Re: AbstractFileOutputOperator maxLength roll over handling
Date Fri, 11 Dec 2015 15:13:25 GMT
Guess we don't need to worry about the case when the tuple size itself is
larger than the
HDFS block size :-)


On Fri, Dec 11, 2015 at 12:37 AM, Yogi Devendra <yogidevendra@apache.org>

> Hi,
> I am using AbstractFileOutputOperator in my application for writing
> incoming tuples into a file on HDFS.
> Considering that there could be failover scenarios; I am using
> fileOutputOperator.setMaxLength() for rolling over the files after
> specified length. Assuming that, rolled over files would have faster
> recovery from the failure (since recovery is only for the last part of the
> file and not for the entire file).
> To set the maxLength; there is no specific recommended value from the
> usecase. Hence, I would prefer the rolled over file sizes to be equal to
> Block size for HDFS (say 64 MB).
> With the current implementation of AbstractFileOutputOperator; actual file
> sizes for the rolled over file would be slightly greater than 64MB. This is
> because, file is being rolled over after the incoming tuple is written to
> to the file. The check for file size (for roll over) happens after the
> tuple is written to the file.
> I believe that, files slightly greater than 64MB would result in 2 entries
> on the NameNode. This can be avoided if we flip the sequence of checking
> the file size (adding incoming tuple) and then rolling over to new file
> *before* writing the incoming tuple.
> Do you think that, this improvement should be considered? If yes; I will
> create a JIRA and work on it.
> Also, does this code change break backward compatibility? Although,
> signature of the API remains same; but there is slight change in the
> semantics. Thus, wanted to get feedback from the community.
> ~ Yogi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message