griffin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nevena Veljkovic (Jira)" <j...@apache.org>
Subject [jira] [Created] (GRIFFIN-293) [Service] livy.need.queue=true
Date Mon, 30 Sep 2019 11:18:00 GMT
Nevena Veljkovic created GRIFFIN-293:
----------------------------------------

             Summary: [Service] livy.need.queue=true
                 Key: GRIFFIN-293
                 URL: https://issues.apache.org/jira/browse/GRIFFIN-293
             Project: Griffin
          Issue Type: Bug
    Affects Versions: 0.6.0
            Reporter: Nevena Veljkovic
             Fix For: 0.6.0


While using griffin in several productions environments, having x10 jobs starting at same
hour, minute, second, we figured out that 2 (or more) concurrent griffin jobs are not submitted
and executed to the end (the last was submitted multiple times, the rest never).

example
 2 jobs "beta_node_metrics_fact" and "beta_node_master_dimension_device", difference between
them is 1 millisecond
{code:java}
2019-09-28 14:00:37.090 INFO 2732 --- [ryBean_Worker-4] o.a.g.c.j.SparkSubmitJob [203] : {
 "measure.type" : "griffin",
 "id" : 60560,
 "name" : "beta_node_metrics_fact",

2019-09-28 14:00:37.091 INFO 2732 --- [ryBean_Worker-5] o.a.g.c.j.SparkSubmitJob [203] : {
 "measure.type" : "griffin",
 "id" : 63751,
 "name" : "beta_node_master_dimension_device",
{code}
livy submitted 2 jobs/tasks, both contained "beta_node_master_dimension_device"

That's why decided to use setting "livy.need.queue=true".
 During testing we figured out queueing does not work at all as LivyTaskSubmitHelper's member
sparkSubmitJob was not instantiated
 [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/job/LivyTaskSubmitHelper.java#L64]

We fixed this and continue with testing.

During testing we figured out that curConcurrentTaskNum does not decrease finished tasks (state
SUCCESS or DEAD).
 [https://github.com/apache/griffin/blob/master/service/src/main/java/org/apache/griffin/core/job/JobServiceImpl.java#L632-L633]

We fixed this also.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message