hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naganarasimha Garla <naganarasimha...@gmail.com>
Subject Re: YARN timelineserver process taking 600% CPU
Date Mon, 05 Oct 2015 18:45:52 GMT
Hi Krzysiek,
Oops My mistake, 3 Gb seems to be on little higher side.
And from the jstack it seems like there were no major activity other than
puts seems like around 16 concurrent puts were happening which tries to get
the timeline Entity hence hitting the native call.

>From the logs it seems like lot of ACL validations are happening and from
the URL it seems like its for PUTEntites.
approximately from 09:30:16 to 09:44:26 about 9213 checks have happened and
if all of these are for puts then roughly about 10 put calls/s is happening
from *spark* side. This i feel is not right usage of ATS, can you check
what is being published from the spark to ATS at this high rate ?

Besides some improvements regarding the timeline metrics is available in
trunk as part of YARN-3360 which could have been useful in analyzing your
issue.

+ Naga


On Mon, Oct 5, 2015 at 1:19 PM, Krzysztof Zarzycki <k.zarzycki@gmail.com>
wrote:

> Hi Naga,
> Sorry, but it's not 3MB, but 3GB in leveldb-timeline-store (du shows
> numbers in kB). Does that seems reasonable as well?
> There are new .sst files generated each minute.
> There are now 26850 files in leveldb-timeline-store directory. New files
> are generated each minute. Some are also being deleted.
>
> I started timeline server today, to gather logs and jstack, it was running
> for ~20 minutes. I attach the tar bz2 archive with those logs.
>
> Thank you for helping me debug this.
> Krzysiek
>
>
>
>
>
> 2015-09-30 21:00 GMT+02:00 Naganarasimha Garla <naganarasimha.gr@gmail.com
> >:
>
>> Hi Krzysiek,
>> seems like the size is around 3 MB which seems to be fine. ,
>> Could you try enabling in debug and share the logs of ATS/AHS and also if
>> possible the jstack output for the AHS process
>>
>> + Naga
>>
>> On Wed, Sep 30, 2015 at 10:27 PM, Krzysztof Zarzycki <
>> k.zarzycki@gmail.com> wrote:
>>
>>> Hi Naga,
>>> I see the following size:
>>> $ sudo du --max=1 /var/lib/hadoop/yarn/timeline
>>> 36      /var/lib/hadoop/yarn/timeline/timeline-state-store.ldb
>>> 3307772 /var/lib/hadoop/yarn/timeline/leveldb-timeline-store.ldb
>>> 3307812 /var/lib/hadoop/yarn/timeline
>>>
>>> The timeline service has been multiple times restarted as I was looking
>>> for issue with it. But it was installed about a 2 months ago. Just few
>>> applications (1?2? ) has been started since its last start. The
>>> ResourceManager interface has 261 entries.
>>>
>>> As in yarn-site.xml that I attached, the variable you're asking for has
>>> the following value:
>>>
>>> <property>
>>>
>>>   <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
>>>       <value>300000</value>
>>> </property>
>>>
>>>
>>> Ah, One more thing: When I looked with jstack to see what the process is
>>> doing, I saw threads spending time in NATIVE in leveldbjni library. So I
>>> *think* it is related to leveldb store.
>>>
>>> Please ask if any more information is needed.
>>> Any help is appreciated! Thanks
>>> Krzysiek
>>>
>>> 2015-09-30 16:23 GMT+02:00 Naganarasimha G R (Naga) <
>>> garlanaganarasimha@huawei.com>:
>>>
>>>> Hi ,
>>>>
>>>> Whats the size of Store Files?
>>>> Since when is it running ? how many applications have been run since it
>>>> has been started ?
>>>> Whats the value of "
>>>> yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms" ?
>>>>
>>>> + Naga
>>>> ------------------------------
>>>> *From:* Krzysztof Zarzycki [k.zarzycki@gmail.com]
>>>> *Sent:* Wednesday, September 30, 2015 19:20
>>>> *To:* user@hadoop.apache.org
>>>> *Subject:* YARN timelineserver process taking 600% CPU
>>>>
>>>> Hi there Hadoopers,
>>>> I have a serious issue with my installation of Hadoop & YARN in version
>>>> 2.7.1 (HDP 2.3).
>>>> The timelineserver process ( more
>>>> precisely org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>>>> class) takes over 600% of CPU, generating enormous load on my master node.
>>>> I can't guess why it happens.
>>>>
>>>> First, I run the timelineserver using java 8, thought that this was an
>>>> issue. But no, I started timelineserver now with use of java 7 and still
>>>> the problem is the same.
>>>>
>>>> My cluster is tiny- it consists of:
>>>> - 2 HDFS nodes
>>>> - 2 HBase RegionServers
>>>> - 2 Kafkas
>>>> - 2 Spark nodes
>>>> - 8 Spark Streaming jobs, processing around 100 messages/second TOTAL.
>>>>
>>>> I'll be very grateful for your help here. If you need any more info,
>>>> please write.
>>>> I also attach yarn-site.xml grepped to options related to timeline
>>>> server.
>>>>
>>>> And here is a command of timeline that I see from ps :
>>>> /usr/java/jdk1.7.0_79/bin/java -Dproc_timelineserver -Xmx1024m
>>>> -Dhdp.version=2.3.0.0-2557 -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn
>>>> -Dyarn.log.dir=/var/log/hadoop-yarn/yarn
>>>> -Dhadoop.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>> -Dyarn.log.file=yarn-yarn-timelineserver-hd-master-a01.log -Dyarn.home.dir=
>>>> -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,EWMA,RFA
>>>> -Dyarn.root.logger=INFO,EWMA,RFA
>>>> -Djava.library.path=:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native
>>>> -Dyarn.policy.file=hadoop-policy.xml
>>>> -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn
>>>> -Dyarn.log.dir=/var/log/hadoop-yarn/yarn
>>>> -Dhadoop.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>> -Dyarn.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>> -Dyarn.home.dir=/usr/hdp/current/hadoop-yarn-timelineserver
>>>> -Dhadoop.home.dir=/usr/hdp/2.3.0.0-2557/hadoop
>>>> -Dhadoop.root.logger=INFO,EWMA,RFA -Dyarn.root.logger=INFO,EWMA,RFA
>>>> -Djava.library.path=:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native
>>>> -classpath
>>>> /usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/conf:/usr/hdp/2.3.0.0-2557/hadoop/lib/*:/usr/hdp/2.3.0.0-2557/hadoop/.//*:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/./:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/.//*:/usr/hdp/2.3.0.0-2557/hadoop-yarn/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-yarn/.//*:/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/.//*:::/usr/share/java/mysql-connector-java.jar::/usr/share/java/mysql-connector-java.jar:/usr/hdp/current/hadoop-yarn-timelineserver/.//*:/usr/hdp/current/hadoop-yarn-timelineserver/lib/*:/usr/hdp/current/hadoop-client/conf/timelineserver-config/log4j.properties
>>>> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>>>>
>>>>
>>>> Thanks!
>>>> Krzysztof
>>>>
>>>>
>>>
>>
>

Mime
View raw message