samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Hoover <roger.hoo...@gmail.com>
Subject Re: Samza hung after bootstrapping
Date Sun, 21 Jun 2015 22:55:52 GMT
Was looking through the code a little and it looks like the
BootstrappingChooser could use the list of SSPs passed into it's register()
method to figure out which partitions it need to monitor.

I wanted to try to build Samza to play around with it but I'm getting error
trying to build off of both the 0.9.0 and 0.9.1 branches.

thedude:samza (0.9.1) $ ./gradlew clean build

To honour the JVM settings for this build a new JVM will be forked. Please
consider using the daemon:
http://gradle.org/docs/2.0/userguide/gradle_daemon.html.

:clean

:samza-api:clean

:samza-core_2.10:clean

:samza-kafka_2.10:clean UP-TO-DATE

:samza-kv-inmemory_2.10:clean UP-TO-DATE

:samza-kv-rocksdb_2.10:clean UP-TO-DATE

:samza-kv_2.10:clean UP-TO-DATE

:samza-log4j:clean UP-TO-DATE

:samza-shell:clean UP-TO-DATE

:samza-test_2.10:clean UP-TO-DATE

:samza-yarn_2.10:clean UP-TO-DATE

:assemble UP-TO-DATE

:rat

Rat report: build/rat/rat-report.html

:check

:build

:samza-api:compileJava

:samza-api:processResources UP-TO-DATE

:samza-api:classes

:samza-api:jar

:samza-api:javadoc

/Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
warning: no @param for ssp

  void setStartingOffset(SystemStreamPartition ssp, String offset);

       ^

/Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
warning: no @param for offset

  void setStartingOffset(SystemStreamPartition ssp, String offset);

       ^

2 warnings

:samza-api:javadocJar

:samza-api:sourcesJar

:samza-api:signArchives SKIPPED

:samza-api:assemble

:samza-api:compileTestJava

:samza-api:processTestResources UP-TO-DATE

:samza-api:testClasses

:samza-api:test

:samza-api:check

:samza-api:build

:samza-core_2.10:compileJava

:samza-core_2.10:compileScala

[ant:scalac]
/Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
error: object SamzaObjectMapper is not a member of package
org.apache.samza.serializers.model

[ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper

[ant:scalac]        ^

[ant:scalac]
/Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
error: object TaskModel is not a member of package
org.apache.samza.job.model

[ant:scalac] import org.apache.samza.job.model.TaskModel

[ant:scalac]        ^

...


I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
appreciate any help.

Thanks,

Roger



On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoover@gmail.com>
wrote:

> I think I see what's happening.
>
> When there are 8 tasks and I set yarn.container.count=8, then each
> container is responsible for a single task.  However, the
> systemStreamLagCounts map (
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77)
> and laggingSystemStreamPartitions (
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83)
> are configured to track all partitions for the bootstrap topic rather than
> just the one partition assigned to this task.
>
> Later in the log, we see that the task/container completed bootstrap for
> it's own partition.
>
> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
> [DEBUG] Bootstrap stream partition is fully caught up:
> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
>
> but the Bootstrapping Chooser still thinks that the remaining partitions
> (assigned to other tasks in other containers) need to be completed.  JMX at
> this point shows 7 lagging partitions of the 8 original partition count.
>
> I'm wondering why no one has run into this.  Doesn't LinkedIn use
> partitioned bootstrapped topics?
>
> Thanks,
>
> Roger
>
> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoover@gmail.com>
> wrote:
>
>> Hi Yan,
>>
>> I've uploaded a file with TRACE level logging here:
>> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
>>
>> I really appreciate your help as this is a critical issue for me.
>>
>> Thanks,
>>
>> Roger
>>
>> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang724@gmail.com> wrote:
>>
>>> Hi Roger,
>>>
>>> " but it only spawns one container and still hangs after bootstrap"
>>>     -- this probably is due to your local machine does not have enough
>>> resource for the second container. Because I checked your log file, each
>>> container is about 4GB.
>>>
>>> "When I run it on our YARN cluster with a single container, it works
>>> correctly.  When I tried it with 5 containers, it gets hung after
>>> consuming
>>> the bootstrap topic."
>>>    -- Have you figure it out? I have a looked at your log and also the
>>> code. My suspect is that, there is a null enveloper somehow blocking the
>>> process. If you can paste the trace level log, it will be more helpful
>>> because many logs in chooser are trace level.
>>>
>>> Thanks,
>>>
>>> Fang, Yan
>>> yanfang724@gmail.com
>>>
>>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoover@gmail.com>
>>> wrote:
>>>
>>> > I need some help.  I have a job which bootstraps one stream and then is
>>> > supposed to read from two.  When I run it on our YARN cluster with a
>>> single
>>> > container, it works correctly.  When I tried it with 5 containers, it
>>> gets
>>> > hung after consuming the bootstrap topic.  I ran it with the grid
>>> script on
>>> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
>>> > container and still hangs after bootstrap.
>>> >
>>> > Debug logs are here: http://pastebin.com/af3KPvju
>>> >
>>> > I looked at JMX metrics and see:
>>> > - Task Metrics - no value for kafka offset of non-bootstrapped stream
>>> > -  SystemConsumerMetrics
>>> >     - choose null keeps incrementing
>>> >      - ssps-needed-by-chooser 1
>>> >       - unprocessed-messages 62k
>>> > - Bootstrapping Chooser
>>> >   - lagging partitions 4
>>> >   - laggin-batch-streams - 4
>>> >   - batch-resets - 0
>>> >
>>> > Has anyone seen this or can offer ideas of how to better debug it?
>>> >
>>> > I'm using Samza 0.9.0 and YARN 2.4.0.
>>> >
>>> > Thanks!
>>> >
>>> > Roger
>>> >
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message