apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yogi Devendra <yogideven...@apache.org>
Subject Re: Proposal for concrete operator for writing to HDFS file
Date Fri, 04 Mar 2016 08:24:15 GMT
Chandni,

I think you are talking about FileWriter Operator under
https://github.com/tweise/apex-samples/blob/master/exactly-once/src/main/java/com/example/myapexapp/AtomicFileOutputApp.java

I looked at the code. This can serve as a good starting point.
I would suggest you to put your code (as-it-is) to malhar.

Your commit will be my starting point. I will make subsequent changes to
modify it to suit other frequent use-cases as discussed above.

Although, I have one variant of the concrete implementation in my private
repo. I can apply similar changes on top of your code as baseline.

This will allow us to take the best part from both the implementations and
get the final version.

Thanks for notifying about your code. Will it be possible for you to open a
malhar PR for this in next 1-2 days? I will wait for your PR to be ready.

~ Yogi

On 4 March 2016 at 07:39, Chandni Singh <chandni@datatorrent.com> wrote:

> Hi Yogi,
>
> Here is an example I wrote.
> https://github.com/tweise/apex-samples/pulls
>
> In the above example, the file is finalized when there are no more tuples
> received in the window.
>
> Finalization of file happens when the file is rotated (based on size/time).
> However for example or demo purpose, we can finalize a file
> if there aren't any input tuples received in a window. If there are more
> tuples after some time, they need to be written to a different file.
> Maybe this can be controlled by a property?
>
> Let me know if you want me to put this in Malhar.
>
> Thanks,
> Chandni
>
> On Thu, Mar 3, 2016 at 5:51 PM, Yogi Devendra <yogidevendra@apache.org>
> wrote:
>
> > Any suggestions/ comments on this?
> >
> > ~ Yogi
> >
> > On 3 March 2016 at 17:44, Yogi Devendra <yogidevendra@apache.org> wrote:
> >
> > > Hi,
> > >
> > > Currently, for writing to HDFS file we have AbstractFileOutputOperator
> in
> > > the malhar library.
> > >
> > > It has following abstract methods :
> > > 1. protected abstract String getFileName(INPUT tuple)
> > > 2. protected abstract byte[] getBytesForTuple(INPUT tuple)
> > >
> > > These methods are kept generic to give flexibility to the app
> developers.
> > > But, someone who is new to apex; would look for ready-made
> implementation
> > > instead of extending Abstract implementation.
> > >
> > > Thus, I am proposing to add concrete operator HDFSOutputOperator to
> > > malhar. Aim of this operator would be to serve the purpose of ready to
> > use
> > > operator for most frequent use-cases.
> > >
> > > Here are my key observations on most frequent use-cases:
> > >
> > >
> >
> ------------------------------------------------------------------------------
> > >
> > > 1. Writing tuples of type byte[] or String.
> > > 2. All tuples on a particular stream land up in the same output file.
> > > 3. App developer may want to add some custom tuple separator (e.g.
> > newline
> > > character) between tuples.
> > >
> > > Please mention your comments regarding :
> > > --------------------------------------------------------
> > >
> > > 1. Will it be useful to have such concrete operator?
> > >
> > > 2. Do you think of any other datatype other than byte[], String that
> > > should be supported out of the box by this concrete operator?
> > > Currently, I am planning to include byte[], String, any other type
> having
> > > valid toString() as input tuples.
> > >
> > > 3. Do you think tuple separator should be configurable?
> > >
> > > 4. Any other feedback?
> > >
> > >
> > > Proposed design:
> > > ----------------------
> > >
> > > 1. This concrete implementation will be extending
> > > AbstractFileOutputOperator with default implementation for abstract
> > methods
> > > mentioned above.
> > >
> > > 2. Filename , Tuple separator will be exposed as a operator property.
> > >
> > > 3. All incoming tuples will be written to same file mentioned in the
> > > property.
> > >
> > > 4. This operator will be added to malhar library under package
> > > com.datatorrent.lib.io.fs where AbstractFileOutputOperator resides.
> > >
> > > ~ Yogi
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message