apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yogi Devendra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file
Date Mon, 07 Mar 2016 11:55:40 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182921#comment-15182921
] 

Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------

[Yogi]

Chandni,

I think you are talking about FileWriter Operator under 
https://github.com/tweise/apex-samples/blob/master/exactly-once/src/main/java/com/example/myapexapp/AtomicFileOutputApp.java

I looked at the code. This can serve as a good starting point.
I would suggest you to put your code (as-it-is) to malhar. 

Your commit will be my starting point. I will make subsequent changes to modify it to suit
other frequent use-cases as discussed above.

Although, I have one variant of the concrete implementation in my private repo. I can apply
similar changes on top of your code as baseline.

This will allow us to take the best part from both the implementations and get the final version.

Thanks for notifying about your code. Will it be possible for you to open a malhar PR for
this in next 1-2 days? I will wait for your PR to be ready.


> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the malhar
library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. But, someone
who is new to apex; would look for ready-made implementation instead of extending Abstract
implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim of this
operator would be to serve the purpose of ready to use operator for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline character)
between tuples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message