samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roger Hoover <roger.hoo...@gmail.com>
Subject Re: Samza hung after bootstrapping
Date Mon, 22 Jun 2015 05:31:05 GMT
Thanks, Yan.  I'll give it a try.

Sent from my iPhone

> On Jun 21, 2015, at 10:02 PM, Yan Fang <yanfang724@gmail.com> wrote:
> 
> Hi Roger,
> 
> I will try to look at the issue tomorrow if my time allows.
> 
> First thing first:
> 
> The build has some unexpected results. A quick fix:
> 
> 1. apply https://issues.apache.org/jira/browse/SAMZA-712
> 2. add
> 
> sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs =
> []
> 
> at line 126 of build.gradle.
> 
> Sorry for the inconvenience.
> 
> Thanks,
> 
> Fang, Yan
> yanfang724@gmail.com
> 
> On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover <roger.hoover@gmail.com>
> wrote:
> 
>> Was looking through the code a little and it looks like the
>> BootstrappingChooser could use the list of SSPs passed into it's register()
>> method to figure out which partitions it need to monitor.
>> 
>> I wanted to try to build Samza to play around with it but I'm getting error
>> trying to build off of both the 0.9.0 and 0.9.1 branches.
>> 
>> thedude:samza (0.9.1) $ ./gradlew clean build
>> 
>> To honour the JVM settings for this build a new JVM will be forked. Please
>> consider using the daemon:
>> http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
>> 
>> :clean
>> 
>> :samza-api:clean
>> 
>> :samza-core_2.10:clean
>> 
>> :samza-kafka_2.10:clean UP-TO-DATE
>> 
>> :samza-kv-inmemory_2.10:clean UP-TO-DATE
>> 
>> :samza-kv-rocksdb_2.10:clean UP-TO-DATE
>> 
>> :samza-kv_2.10:clean UP-TO-DATE
>> 
>> :samza-log4j:clean UP-TO-DATE
>> 
>> :samza-shell:clean UP-TO-DATE
>> 
>> :samza-test_2.10:clean UP-TO-DATE
>> 
>> :samza-yarn_2.10:clean UP-TO-DATE
>> 
>> :assemble UP-TO-DATE
>> 
>> :rat
>> 
>> Rat report: build/rat/rat-report.html
>> 
>> :check
>> 
>> :build
>> 
>> :samza-api:compileJava
>> 
>> :samza-api:processResources UP-TO-DATE
>> 
>> :samza-api:classes
>> 
>> :samza-api:jar
>> 
>> :samza-api:javadoc
>> 
>> 
>> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
>> warning: no @param for ssp
>> 
>>  void setStartingOffset(SystemStreamPartition ssp, String offset);
>> 
>>       ^
>> 
>> 
>> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
>> warning: no @param for offset
>> 
>>  void setStartingOffset(SystemStreamPartition ssp, String offset);
>> 
>>       ^
>> 
>> 2 warnings
>> 
>> :samza-api:javadocJar
>> 
>> :samza-api:sourcesJar
>> 
>> :samza-api:signArchives SKIPPED
>> 
>> :samza-api:assemble
>> 
>> :samza-api:compileTestJava
>> 
>> :samza-api:processTestResources UP-TO-DATE
>> 
>> :samza-api:testClasses
>> 
>> :samza-api:test
>> 
>> :samza-api:check
>> 
>> :samza-api:build
>> 
>> :samza-core_2.10:compileJava
>> 
>> :samza-core_2.10:compileScala
>> 
>> [ant:scalac]
>> 
>> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
>> error: object SamzaObjectMapper is not a member of package
>> org.apache.samza.serializers.model
>> 
>> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper
>> 
>> [ant:scalac]        ^
>> 
>> [ant:scalac]
>> 
>> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
>> error: object TaskModel is not a member of package
>> org.apache.samza.job.model
>> 
>> [ant:scalac] import org.apache.samza.job.model.TaskModel
>> 
>> [ant:scalac]        ^
>> 
>> ...
>> 
>> 
>> I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
>> appreciate any help.
>> 
>> Thanks,
>> 
>> Roger
>> 
>> 
>> 
>> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoover@gmail.com>
>> wrote:
>> 
>>> I think I see what's happening.
>>> 
>>> When there are 8 tasks and I set yarn.container.count=8, then each
>>> container is responsible for a single task.  However, the
>>> systemStreamLagCounts map (
>> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77
>> )
>>> and laggingSystemStreamPartitions (
>> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83
>> )
>>> are configured to track all partitions for the bootstrap topic rather
>> than
>>> just the one partition assigned to this task.
>>> 
>>> Later in the log, we see that the task/container completed bootstrap for
>>> it's own partition.
>>> 
>>> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
>>> [DEBUG] Bootstrap stream partition is fully caught up:
>>> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
>>> 
>>> but the Bootstrapping Chooser still thinks that the remaining partitions
>>> (assigned to other tasks in other containers) need to be completed.  JMX
>> at
>>> this point shows 7 lagging partitions of the 8 original partition count.
>>> 
>>> I'm wondering why no one has run into this.  Doesn't LinkedIn use
>>> partitioned bootstrapped topics?
>>> 
>>> Thanks,
>>> 
>>> Roger
>>> 
>>> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoover@gmail.com>
>>> wrote:
>>> 
>>>> Hi Yan,
>>>> 
>>>> I've uploaded a file with TRACE level logging here:
>>>> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
>>>> 
>>>> I really appreciate your help as this is a critical issue for me.
>>>> 
>>>> Thanks,
>>>> 
>>>> Roger
>>>> 
>>>> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang724@gmail.com>
>> wrote:
>>>> 
>>>>> Hi Roger,
>>>>> 
>>>>> " but it only spawns one container and still hangs after bootstrap"
>>>>>    -- this probably is due to your local machine does not have enough
>>>>> resource for the second container. Because I checked your log file,
>> each
>>>>> container is about 4GB.
>>>>> 
>>>>> "When I run it on our YARN cluster with a single container, it works
>>>>> correctly.  When I tried it with 5 containers, it gets hung after
>>>>> consuming
>>>>> the bootstrap topic."
>>>>>   -- Have you figure it out? I have a looked at your log and also the
>>>>> code. My suspect is that, there is a null enveloper somehow blocking
>> the
>>>>> process. If you can paste the trace level log, it will be more helpful
>>>>> because many logs in chooser are trace level.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Fang, Yan
>>>>> yanfang724@gmail.com
>>>>> 
>>>>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoover@gmail.com>
>>>>> wrote:
>>>>> 
>>>>>> I need some help.  I have a job which bootstraps one stream and then
>> is
>>>>>> supposed to read from two.  When I run it on our YARN cluster with
a
>>>>> single
>>>>>> container, it works correctly.  When I tried it with 5 containers,
it
>>>>> gets
>>>>>> hung after consuming the bootstrap topic.  I ran it with the grid
>>>>> script on
>>>>>> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns
>> one
>>>>>> container and still hangs after bootstrap.
>>>>>> 
>>>>>> Debug logs are here: http://pastebin.com/af3KPvju
>>>>>> 
>>>>>> I looked at JMX metrics and see:
>>>>>> - Task Metrics - no value for kafka offset of non-bootstrapped stream
>>>>>> -  SystemConsumerMetrics
>>>>>>    - choose null keeps incrementing
>>>>>>     - ssps-needed-by-chooser 1
>>>>>>      - unprocessed-messages 62k
>>>>>> - Bootstrapping Chooser
>>>>>>  - lagging partitions 4
>>>>>>  - laggin-batch-streams - 4
>>>>>>  - batch-resets - 0
>>>>>> 
>>>>>> Has anyone seen this or can offer ideas of how to better debug it?
>>>>>> 
>>>>>> I'm using Samza 0.9.0 and YARN 2.4.0.
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Roger
>> 

Mime
View raw message