apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yogi Devendra (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (APEXMALHAR-2009) concrete operator for writing to HDFS file
Date Mon, 07 Mar 2016 11:56:40 GMT

    [ https://issues.apache.org/jira/browse/APEXMALHAR-2009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182923#comment-15182923
] 

Yogi Devendra commented on APEXMALHAR-2009:
-------------------------------------------

[Yogi]

Ashwin,

Please see my replies inline:

On 5 March 2016 at 22:42, Ashwin Chandra Putta <ashwinchandrap@gmail.com> wrote:
I think the concrete implementation should contain the following to allow
for the most common use cases.

1. Take any java object as input and get the bytes of the string returned
from toString method on the object.

Yes. It would allow any java object and byte[] will be derived from the toString(). If input
is byte[]; then it would be passed on without any conversion.
 
2. The separator should be configurable. Null separator should also be
valid.

Implementation will allow any String separator. Default would be newline. 
Even empty string will be supported. 
Are you referring to no-separator case by Null separator? How about using empty string for
no-separator instead of Null to avoid any special handling?
 
3. Should have one time configurable file path and name.

Yes. Filepath and name will be configurable as a property. 

 
4. Should have configurable time based and size based rotation policy.

Do you mean rotate based on whichever happens first?

Size based rotation policy will be inherited from AbstractFileOutputOperator.

For time based rotation, are you referring to write one file for X windows? 
OR rotate if there is no new data for X windows?

In either case, can we say that set appropriate value X for APPLICATION_WINDOW_COUNT for this
operator?
OR should we expose another property rotationWindowCount for this?
 

Regards,
Ashwin.

~ Yogi 

> concrete operator for writing to HDFS file
> ------------------------------------------
>
>                 Key: APEXMALHAR-2009
>                 URL: https://issues.apache.org/jira/browse/APEXMALHAR-2009
>             Project: Apache Apex Malhar
>          Issue Type: Task
>            Reporter: Yogi Devendra
>            Assignee: Yogi Devendra
>
> Currently, for writing to HDFS file we have AbstractFileOutputOperator in the malhar
library.
> It has following abstract methods :
> 1. protected abstract String getFileName(INPUT tuple)
> 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> These methods are kept generic to give flexibility to the app developers. But, someone
who is new to apex; would look for ready-made implementation instead of extending Abstract
implementation.
> Thus, I am proposing to add concrete operator HDFSOutputOperator to malhar. Aim of this
operator would be to serve the purpose of ready to use operator for most frequent use-cases.
> Here are my key observations on most frequent use-cases:
> ------------------------------------------------------------------------------
> 1. Writing tuples of type byte[] or String. 
> 2. All tuples on a particular stream land up in the same output file.
> 3. App developer may want to add some custom tuple separator (e.g. newline character)
between tuples.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message