apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <priya...@datatorrent.com>
Subject Re: GenericFileOutputOpeator doesn't work for all hadoop file systems
Date Thu, 25 Aug 2016 05:47:11 GMT
I would suggest, we override "openStream" in GenericFileOutputOpeator, as
suggested in option 2 and then handle "append" in different way for FS
which doesn't support append. Or else create concrete classes for all file
systems which don't support append and override the required functions.

-1 for modifying Abstract class to take care of unsupported operations.

-Priyanka

On Wed, Aug 24, 2016 at 6:21 PM, Chaitanya Chebolu <
chaitanya@datatorrent.com> wrote:

> Hi All,
>
>     GenericFileOutputOpeator which is in Malhar repository works only for
> few file systems. GenericFileOutputOpeator is extended from
> AbstractFileOutputOperator.
>
> Reason: openStream() method which is in AbstractFileOutputOperator calls
> append operation. But, all the file systems doesn't support append
> operation. Some of the file systems which are not supported append()
> operation are FTP, S3.
>
>   If the GenericFileOutputOpeator used for file systems which are not
> supported append() operation and operator goes down & comes back then file
> system throws exception "Not Supported".
>
> Solution: Following method needs to be called instead of fs.append():
>
>
> protected FSDataOutputStream openStreamForNonAppendFS(Path filepath) throws
> IOException    {
>
> Path appendTmpFile = new Path(filepath + “_APPENDING”);
>
> rename(filepath, appendTmpFile);
>
> FSDataInputStream fsIn = fs.open(appendTmpFile);
>
> FSDataOutputStream fsOut = fs.create(filepath);
>
> IOUtils.copy(fsIn, fsOut);
>
> flush(fsOut);
>
> fs.delete(appendTmpFile);
>
> return fsOut;
>
> }
>
>
> Below are the options to fix this issue.
>
> (1) Fix it in AbstractFileOutputOperator - Catch the "Not Supported"
> exception and then call the openStreamForNonAppendFS() method.
>
> (2) Fix it in GenericFileOutputOpeator (Same as approach (1))
>
> (3) Create a new operator which extends from AbstractFileOutputOperator and
> override the openStream() method. This new operator could be used only for
> file systems which are not supported append operation.
>
> Please share your thoughts and vote on above approaches.
>
> Regards,
> Chaitanya
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message