apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yogi Devendra (JIRA)" <j...@apache.org>
Subject [jira] [Created] (APEXMALHAR-2009) concrete operator for writing to HDFS file
Date Mon, 07 Mar 2016 11:53:40 GMT
Yogi Devendra created APEXMALHAR-2009:

             Summary: concrete operator for writing to HDFS file
                 Key: APEXMALHAR-2009
                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
             Project: Apache Apex Malhar
          Issue Type: Task
            Reporter: Yogi Devendra
            Assignee: Yogi Devendra

Currently, for writing to HDFS file we have AbstractFileOutputOperator in the malhar library.

It has following abstract methods :
1. protected abstract String getFileName(INPUT tuple)
2. protected abstract byte[] getBytesForTuple(INPUT tuple)

These methods are kept generic to give flexibility to the app developers. But, someone who
is new to apex; would look for ready-made implementation instead of extending Abstract implementation.

Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim of this operator
would be to serve the purpose of ready to use operator for most frequent use-cases.

Here are my key observations on most frequent use-cases:

1. Writing tuples of type byte[] or String. 
2. All tuples on a particular stream land up in the same output file.
3. App developer may want to add some custom tuple separator (e.g. newline character) between

This message was sent by Atlassian JIRA

View raw message