samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Fang <yanfang...@gmail.com>
Subject Re: Samza hung after bootstrapping
Date Mon, 22 Jun 2015 05:02:45 GMT
Hi Roger,

I will try to look at the issue tomorrow if my time allows.

First thing first:

The build has some unexpected results. A quick fix:

1. apply https://issues.apache.org/jira/browse/SAMZA-712
2. add

sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs =
[]

at line 126 of build.gradle.

Sorry for the inconvenience.

Thanks,

Fang, Yan
yanfang724@gmail.com

On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover <roger.hoover@gmail.com>
wrote:

> Was looking through the code a little and it looks like the
> BootstrappingChooser could use the list of SSPs passed into it's register()
> method to figure out which partitions it need to monitor.
>
> I wanted to try to build Samza to play around with it but I'm getting error
> trying to build off of both the 0.9.0 and 0.9.1 branches.
>
> thedude:samza (0.9.1) $ ./gradlew clean build
>
> To honour the JVM settings for this build a new JVM will be forked. Please
> consider using the daemon:
> http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
>
> :clean
>
> :samza-api:clean
>
> :samza-core_2.10:clean
>
> :samza-kafka_2.10:clean UP-TO-DATE
>
> :samza-kv-inmemory_2.10:clean UP-TO-DATE
>
> :samza-kv-rocksdb_2.10:clean UP-TO-DATE
>
> :samza-kv_2.10:clean UP-TO-DATE
>
> :samza-log4j:clean UP-TO-DATE
>
> :samza-shell:clean UP-TO-DATE
>
> :samza-test_2.10:clean UP-TO-DATE
>
> :samza-yarn_2.10:clean UP-TO-DATE
>
> :assemble UP-TO-DATE
>
> :rat
>
> Rat report: build/rat/rat-report.html
>
> :check
>
> :build
>
> :samza-api:compileJava
>
> :samza-api:processResources UP-TO-DATE
>
> :samza-api:classes
>
> :samza-api:jar
>
> :samza-api:javadoc
>
>
> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
> warning: no @param for ssp
>
>   void setStartingOffset(SystemStreamPartition ssp, String offset);
>
>        ^
>
>
> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
> warning: no @param for offset
>
>   void setStartingOffset(SystemStreamPartition ssp, String offset);
>
>        ^
>
> 2 warnings
>
> :samza-api:javadocJar
>
> :samza-api:sourcesJar
>
> :samza-api:signArchives SKIPPED
>
> :samza-api:assemble
>
> :samza-api:compileTestJava
>
> :samza-api:processTestResources UP-TO-DATE
>
> :samza-api:testClasses
>
> :samza-api:test
>
> :samza-api:check
>
> :samza-api:build
>
> :samza-core_2.10:compileJava
>
> :samza-core_2.10:compileScala
>
> [ant:scalac]
>
> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
> error: object SamzaObjectMapper is not a member of package
> org.apache.samza.serializers.model
>
> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper
>
> [ant:scalac]        ^
>
> [ant:scalac]
>
> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
> error: object TaskModel is not a member of package
> org.apache.samza.job.model
>
> [ant:scalac] import org.apache.samza.job.model.TaskModel
>
> [ant:scalac]        ^
>
> ...
>
>
> I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
> appreciate any help.
>
> Thanks,
>
> Roger
>
>
>
> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoover@gmail.com>
> wrote:
>
> > I think I see what's happening.
> >
> > When there are 8 tasks and I set yarn.container.count=8, then each
> > container is responsible for a single task.  However, the
> > systemStreamLagCounts map (
> >
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77
> )
> > and laggingSystemStreamPartitions (
> >
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83
> )
> > are configured to track all partitions for the bootstrap topic rather
> than
> > just the one partition assigned to this task.
> >
> > Later in the log, we see that the task/container completed bootstrap for
> > it's own partition.
> >
> > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
> > [DEBUG] Bootstrap stream partition is fully caught up:
> > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
> >
> > but the Bootstrapping Chooser still thinks that the remaining partitions
> > (assigned to other tasks in other containers) need to be completed.  JMX
> at
> > this point shows 7 lagging partitions of the 8 original partition count.
> >
> > I'm wondering why no one has run into this.  Doesn't LinkedIn use
> > partitioned bootstrapped topics?
> >
> > Thanks,
> >
> > Roger
> >
> > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoover@gmail.com>
> > wrote:
> >
> >> Hi Yan,
> >>
> >> I've uploaded a file with TRACE level logging here:
> >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
> >>
> >> I really appreciate your help as this is a critical issue for me.
> >>
> >> Thanks,
> >>
> >> Roger
> >>
> >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang724@gmail.com>
> wrote:
> >>
> >>> Hi Roger,
> >>>
> >>> " but it only spawns one container and still hangs after bootstrap"
> >>>     -- this probably is due to your local machine does not have enough
> >>> resource for the second container. Because I checked your log file,
> each
> >>> container is about 4GB.
> >>>
> >>> "When I run it on our YARN cluster with a single container, it works
> >>> correctly.  When I tried it with 5 containers, it gets hung after
> >>> consuming
> >>> the bootstrap topic."
> >>>    -- Have you figure it out? I have a looked at your log and also the
> >>> code. My suspect is that, there is a null enveloper somehow blocking
> the
> >>> process. If you can paste the trace level log, it will be more helpful
> >>> because many logs in chooser are trace level.
> >>>
> >>> Thanks,
> >>>
> >>> Fang, Yan
> >>> yanfang724@gmail.com
> >>>
> >>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoover@gmail.com>
> >>> wrote:
> >>>
> >>> > I need some help.  I have a job which bootstraps one stream and then
> is
> >>> > supposed to read from two.  When I run it on our YARN cluster with
a
> >>> single
> >>> > container, it works correctly.  When I tried it with 5 containers,
it
> >>> gets
> >>> > hung after consuming the bootstrap topic.  I ran it with the grid
> >>> script on
> >>> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns
> one
> >>> > container and still hangs after bootstrap.
> >>> >
> >>> > Debug logs are here: http://pastebin.com/af3KPvju
> >>> >
> >>> > I looked at JMX metrics and see:
> >>> > - Task Metrics - no value for kafka offset of non-bootstrapped stream
> >>> > -  SystemConsumerMetrics
> >>> >     - choose null keeps incrementing
> >>> >      - ssps-needed-by-chooser 1
> >>> >       - unprocessed-messages 62k
> >>> > - Bootstrapping Chooser
> >>> >   - lagging partitions 4
> >>> >   - laggin-batch-streams - 4
> >>> >   - batch-resets - 0
> >>> >
> >>> > Has anyone seen this or can offer ideas of how to better debug it?
> >>> >
> >>> > I'm using Samza 0.9.0 and YARN 2.4.0.
> >>> >
> >>> > Thanks!
> >>> >
> >>> > Roger
> >>> >
> >>>
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message