apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yogi Devendra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (APEXMALHAR-2009) concrete operator for writing to HDFS file
Date Mon, 07 Mar 2016 11:53:40 GMT
Yogi Devendra created APEXMALHAR-2009:
-----------------------------------------

             Summary: concrete operator for writing to HDFS file
                 Key: APEXMALHAR-2009
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
             Project: Apache Apex Malhar
          Issue Type: Task
            Reporter: Yogi Devendra
            Assignee: Yogi Devendra


Currently, for writing to HDFS file we have AbstractFileOutputOperator in the malhar library.

It has following abstract methods :
1. protected abstract String getFileName(INPUT tuple)
2. protected abstract byte[] getBytesForTuple(INPUT tuple)

These methods are kept generic to give flexibility to the app developers. But, someone who
is new to apex; would look for ready-made implementation instead of extending Abstract implementation.

Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim of this operator
would be to serve the purpose of ready to use operator for most frequent use-cases.

Here are my key observations on most frequent use-cases:
------------------------------------------------------------------------------

1. Writing tuples of type byte[] or String. 
2. All tuples on a particular stream land up in the same output file.
3. App developer may want to add some custom tuple separator (e.g. newline character) between
tuples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message