apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Priyanka Gugale <priya...@datatorrent.com>
Subject AbstractFileOutputOperator to be used with ftp and s3 file System
Date Tue, 03 Nov 2015 06:51:44 GMT
Hi,

AbstractFileOutputOperator is used to write output files. The operator has
a method "getFSInstance". This initializes file system. One can override
the method to initialize desired file system which extends hadoop
FileSystem. In our implementation we have overridden "getFSInstance" to
initialize FTPFileSystem.

The file loader code in setup method of AbstractFileOutputOperator opens
the file in append mode when file is already present. The issue is
FTPFileSystem doesn't support append function.

The solution to problem could be:
1. Override append method in FTPFileSystem.
    -This would be tricky as file system doesn't support the operation. And
there are other file systems as well like S3 which also don't support
append.
2. Avoid using functions like "append" which are not supported by some of
the implementations of Hadoop FileSystem.
3. Write file loading logic (which is in setup method) in functions which
can be extended by subclass to override the logic to load files (by
avoiding using calls like append which are not supported by user's chosen
file system).

-Priyanka

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message