spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Aggarwal <different.sac...@gmail.com>
Subject Re: submissionTime vs batchTime, DirectKafka
Date Thu, 10 Mar 2016 17:26:34 GMT
hi

can this be considered a lag in processing of events?
should we report this as delay.

On Thu, Mar 10, 2016 at 10:51 AM, Mario Ds Briggs <mario.briggs@in.ibm.com>
wrote:

> Look at
> org.apache.spark.streaming.scheduler.JobGenerator
>
> it has a RecurringTimer (timer) that will simply post 'JobGenerate'
> events to a EventLoop at the batchInterval time.
>
> This EventLoop's thread then picks up these events, uses the
> streamingContext.graph' to generate a Job (InputDstream's compute method).
> batchInfo.submissionTime is the time recorded after this generateJob
> completes. The Job is then sent to the or
> g.apache.spark.streaming.scheduler.JobScheduler who has a
> ThreadExecutorPool to execute the Job.
>
> JobGenerate events are not the only event that gets posted to the
> JobGenerator.eventLoop. Other events are like DoCheckpoint,
> ClearCheckpointData, ClearMetadata are also posted and all these events
> are serviced by the EventLoop's single thread. So for instance if a
> DoCheckPoint, ClearCheckpointData and ClearMetadata events are queued
> before your nth JobGenerate event, then there will be a time difference
> between the batchTime and SubmissionTime for that nth batch
>
>
> thanks
> Mario
>
>
>
>
>
>
> On Thu, Mar 10, 2016 at 10:29 AM, Sachin Aggarwal <
> *different.sachin@gmail.com* <different.sachin@gmail.com>> wrote:
>
>    Hi cody,
>
>    let me try once again to explain with example.
>
>    In BatchInfo class of spark "scheduling delay" is defined as
>
>    *def **schedulingDelay: Option[Long] = processingStartTime.map(_ -
>    submissionTime)*
>
>    I am dumping batchinfo object in my LatencyListener which
>    extends StreamingListener.
>    batchTime = 1457424695400 ms
>    submissionTime = 1457425630780 ms
>    difference = 935380 ms
>
>    can this be considered a lag in processing of events . what is
>    possible explaination for this lag?
>
>    On Thu, Mar 10, 2016 at 12:22 AM, Cody Koeninger <*cody@koeninger.org*
>    <cody@koeninger.org>> wrote:
>    I'm really not sure what you're asking.
>
>    On Wed, Mar 9, 2016 at 12:43 PM, Sachin Aggarwal
>    <*different.sachin@gmail.com* <different.sachin@gmail.com>> wrote:
>    > where are we capturing this delay?
>    > I am aware of scheduling delay which is defined as processing
>    > time-submission time not the batch create time
>    >
>    > On Wed, Mar 9, 2016 at 10:46 PM, Cody Koeninger <*cody@koeninger.org*
>    <cody@koeninger.org>> wrote:
>    >>
>    >> Spark streaming by default will not start processing a batch until
>    the
>    >> current batch is finished.  So if your processing time is larger
>    than
>    >> your batch time, delays will build up.
>    >>
>    >> On Wed, Mar 9, 2016 at 11:09 AM, Sachin Aggarwal
>    >> <*different.sachin@gmail.com* <different.sachin@gmail.com>> wrote:
>    >> > Hi All,
>    >> >
>    >> > we have batchTime and submissionTime.
>    >> >
>    >> > @param batchTime   Time of the batch
>    >> >
>    >> > @param submissionTime  Clock time of when jobs of this batch was
>    >> > submitted
>    >> > to the streaming scheduler queue
>    >> >
>    >> > 1) we are seeing difference between batchTime and submissionTime
>    for
>    >> > small
>    >> > batches(300ms) even in minutes for direct kafka this we see, only
>    when
>    >> > the
>    >> > processing time is more than the batch interval. how can we
>    explain this
>    >> > delay??
>    >> >
>    >> > 2) In one of case batch processing time is more then batch
>    interval,
>    >> > then
>    >> > will spark fetch the next batch data from kafka parallelly
>    processing
>    >> > the
>    >> > current batch or it will wait for current batch to finish first ?
>    >> >
>    >> > I would be thankful if you give me some pointers
>    >> >
>    >> > Thanks!
>    >> > --
>    >> >
>    >> > Thanks & Regards
>    >> >
>    >> > Sachin Aggarwal
>    >> > *7760502772* <7760502772>
>    >
>    >
>    >
>    >
>    > --
>    >
>    > Thanks & Regards
>    >
>    > Sachin Aggarwal
>    > *7760502772* <7760502772>
>
>
>
>    --
>
>    Thanks & Regards
>
>    Sachin Aggarwal
>    *7760502772* <7760502772>
>
>
>
>
> --
>
> Thanks & Regards
>
> Sachin Aggarwal
> 7760502772
>
>
>


-- 

Thanks & Regards

Sachin Aggarwal
7760502772

Mime
View raw message