apex-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: Is there a way to schedule an operator?
Date Wed, 14 Jun 2017 14:32:27 GMT
The only thing missing is to kick off a job, in case the ask is to use
resources the batch way "use and terminate once done". An operator that
keeps an eye and has ability to kick off a job suffices. Kicking off a
batch job can be done via any of the following

1. Files
   -> Start post all data arrival. Usually a .done file in a dir, which
triggers entire dir to be processed
   -> Start asap and end on .done
2. Message (a start message)

I think batch use cases are mainly #1. This technically is not a batch vs
stream use case, just a scheduler (Oozie like) part of batch.


E:amol@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*


On Tue, Jun 13, 2017 at 11:47 PM, Ganelin, Ilya <Ilya.Ganelin@capitalone.com
> wrote:

> I think it's a very relevant use case. In the Apex formulation this would
> work as follows. An operator runs continuously and maintains an internal
> state that tracks process files or an offset (e.g. In Kafka). As more data
> becomes available, the operator performs the appropriate operation and then
> returns to waiting. In this fashion, batched data is processed as soon as
> it becomes available but the process overall is still a batch process since
> it's limited by the production of the source batches.
> There are a couple of examples of this in Malhar, for example the
> AbstractFileInputOperator.
> Your earlier comment with regards to your motivation is interesting. Can
> you elaborate on the load reduction you get with your approach? A number of
> batched small writes to a DB may prove to be more efficient from a latency
> or database utilization standpoint when compared with infrequent large
> batch writes particularly if they involve index updates.
> ------------------------------
> *From:* dashirov@yahoo.com <dashirov@yahoo.com>
> *Sent:* Tuesday, June 13, 2017 6:36:29 PM
> *To:* guilhermehott@gmail.com; users@apex.apache.org
> *Subject:* Re: Is there a way to schedule an operator?
> I have input operators that reach out to Google, Facebook, Bing, Yahoo
> etc. once a day or an hour and download marketing spend statistics. Apex
> promises batch and streaming to be equal class citizens. How is this
> equality achieved if there's no scheduler for batch jobs to rely on? If
> want the dag to take data stream from batch pipeline and affect streaming
> pipelines running alongside. Do you not see this as a valid use case?
> Sent from Yahoo Mail on Android
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> On Tue, Jun 13, 2017 at 5:29 PM, Guilherme Hott
> <guilhermehott@gmail.com> wrote:
> Hi guys,
> Is there a way to schedule an operator? I need an operator start the DAG
> once a day at 00am.
> Best
> --
> *Guilherme Hott*
> *Software Engineer*
> Skype: guilhermehott
> @guilhermehott
> https://www.linkedin.com/in/guilhermehott
> ------------------------------
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.

View raw message