kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alberto Ramón <a.ramonporto...@gmail.com>
Subject Re: kylin job stop accidentally and can resume success!
Date Mon, 13 Feb 2017 16:34:54 GMT
check this
<https://www.mapr.com/blog/best-practices-yarn-resource-management>:
"Basically, it means RM can only allocate memory to containers in
increments of .  . . "

TIP: is your RM in a work node? If this is true, this can be the problem
(Its good idea put yarn master, RM, in a dedicated node)


2017-02-13 17:19 GMT+01:00 不清 <452652018@qq.com>:

> how can i get this heap size?
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "Alberto Ramón";<a.ramonportoles@gmail.com>;
> *发送时间:* 2017年2月14日(星期二) 凌晨0:17
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: kylin job stop accidentally and can resume success!
>
> Sounds like a problem of Resource Manager (RM) of YARN, check the Heap
> size for RM
> Kylin loose connectivity whit RM
>
> 2017-02-13 17:00 GMT+01:00 不清 <452652018@qq.com>:
>
>> hello,kylin community!
>>
>> sometimes my jobs stop accidenttly.It is can stop by any step.
>>
>> kylin log is like :
>> 2017-02-13 23:27:01,549 DEBUG [pool-8-thread-8]
>> hbase.HBaseResourceStore:262 : Update row /execute_output/48dee96e-10fd-472b-b466-39505b6e57c0-02
>> from oldTs: 1486999611524, to newTs: 1486999621545, operation result: true
>> 2017-02-13 23:27:13,384 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 0
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:14,387 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 1
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,388 INFO  [pool-8-thread-8] ipc.Client:842 : Retrying
>> connect to server: jxhdp1datanode29/10.180.212.61:50504. Already tried 2
>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3,
>> sleepTime=1000 MILLISECONDS)
>> 2017-02-13 23:27:15,495 INFO  [pool-8-thread-8]
>> mapred.ClientServiceDelegate:273 : Application state is completed.
>> FinalApplicationStatus=KILLED. Redirecting to job history server
>> 2017-02-13 23:27:15,539 DEBUG [pool-8-thread-8] dao.ExecutableDao:210 :
>> updating job output, id: 48dee96e-10fd-472b-b466-39505b6e57c0-02
>>
>> CM log is like:
>> Job Name: Kylin_Cube_Builder_user_all_cube_2_only_msisdn
>> User Name: tmn
>> Queue: root.tmn
>> State: KILLED
>> Uberized: false
>> Submitted: Sun Feb 12 19:19:24 CST 2017
>> Started: Sun Feb 12 19:19:38 CST 2017
>> Finished: Sun Feb 12 20:30:13 CST 2017
>> Elapsed: 1hrs, 10mins, 35sec
>> Diagnostics:
>> Kill job job_1486825738076_4205 received from tmn (auth:SIMPLE) at
>> 10.180.212.38
>> Job received Kill while in RUNNING state.
>> Average Map Time 24mins, 48sec
>>
>> mapreduce job log
>> Task KILL is received. Killing attempt!
>>
>> and when this happened ,by resume job,the job can resume success! I mean
>>  it is not stop by error!
>>
>> what's the problem?
>>
>> My hadoop cluster is very busy,this situation happens very often.
>>
>> can I set retry time and retry  Interval?
>>
>
>

Mime
View raw message