hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Krzysztof Zarzycki <k.zarzy...@gmail.com>
Subject Re: YARN timelineserver process taking 600% CPU
Date Thu, 05 Nov 2015 14:21:01 GMT
Thanks Naga for your input,  (I'm sorry for a late response, I was out for
some time).

So you believe that Spark is actually doing the PUTs? There are currently 8
Spark Streaming jobs constantly running, each 3 with 1 second batch, 5 x 10
s. I believe these are the jobs that publish to ATS.  How could I check
what precisely is doing what or how to get some logs about it, I don't
know...
I though maybe it is Spark History Server doing the puts, but it seems it
is not, as I disabled it and the load hasn't gone down. So it seems these
are the jobs itself indeed.

Now I have the following problems:
1. The most important: How can I at least *workaround* this issue? Maybe I
will somehow disable Spark usage of Yarn timelineserver ? What are the
consequences? Is it only history of Spark finished jobs not being saved? If
yes, that doesn't hurt that much. Probably this is a question to Spark
group...
2. Is 8 concurrent Spark Streaming jobs really that high for
Timelineserver? I have just a small cluster, how other larger companies are
handling much larger load?

Thanks for helping me with this!
Krzysiek










2015-10-05 20:45 GMT+02:00 Naganarasimha Garla <naganarasimha.gr@gmail.com>:

> Hi Krzysiek,
> Oops My mistake, 3 Gb seems to be on little higher side.
> And from the jstack it seems like there were no major activity other than
> puts seems like around 16 concurrent puts were happening which tries to get
> the timeline Entity hence hitting the native call.
>
> From the logs it seems like lot of ACL validations are happening and from
> the URL it seems like its for PUTEntites.
> approximately from 09:30:16 to 09:44:26 about 9213 checks have happened
> and if all of these are for puts then roughly about 10 put calls/s is
> happening from *spark* side. This i feel is not right usage of ATS, can
> you check what is being published from the spark to ATS at this high rate ?
>
> Besides some improvements regarding the timeline metrics is available in
> trunk as part of YARN-3360 which could have been useful in analyzing your
> issue.
>
> + Naga
>
>
> On Mon, Oct 5, 2015 at 1:19 PM, Krzysztof Zarzycki <k.zarzycki@gmail.com>
> wrote:
>
>> Hi Naga,
>> Sorry, but it's not 3MB, but 3GB in leveldb-timeline-store (du shows
>> numbers in kB). Does that seems reasonable as well?
>> There are new .sst files generated each minute.
>> There are now 26850 files in leveldb-timeline-store directory. New files
>> are generated each minute. Some are also being deleted.
>>
>> I started timeline server today, to gather logs and jstack, it was
>> running for ~20 minutes. I attach the tar bz2 archive with those logs.
>>
>> Thank you for helping me debug this.
>> Krzysiek
>>
>>
>>
>>
>>
>> 2015-09-30 21:00 GMT+02:00 Naganarasimha Garla <
>> naganarasimha.gr@gmail.com>:
>>
>>> Hi Krzysiek,
>>> seems like the size is around 3 MB which seems to be fine. ,
>>> Could you try enabling in debug and share the logs of ATS/AHS and also
>>> if possible the jstack output for the AHS process
>>>
>>> + Naga
>>>
>>> On Wed, Sep 30, 2015 at 10:27 PM, Krzysztof Zarzycki <
>>> k.zarzycki@gmail.com> wrote:
>>>
>>>> Hi Naga,
>>>> I see the following size:
>>>> $ sudo du --max=1 /var/lib/hadoop/yarn/timeline
>>>> 36      /var/lib/hadoop/yarn/timeline/timeline-state-store.ldb
>>>> 3307772 /var/lib/hadoop/yarn/timeline/leveldb-timeline-store.ldb
>>>> 3307812 /var/lib/hadoop/yarn/timeline
>>>>
>>>> The timeline service has been multiple times restarted as I was looking
>>>> for issue with it. But it was installed about a 2 months ago. Just few
>>>> applications (1?2? ) has been started since its last start. The
>>>> ResourceManager interface has 261 entries.
>>>>
>>>> As in yarn-site.xml that I attached, the variable you're asking for has
>>>> the following value:
>>>>
>>>> <property>
>>>>
>>>>   <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
>>>>       <value>300000</value>
>>>> </property>
>>>>
>>>>
>>>> Ah, One more thing: When I looked with jstack to see what the process
>>>> is doing, I saw threads spending time in NATIVE in leveldbjni library. So
I
>>>> *think* it is related to leveldb store.
>>>>
>>>> Please ask if any more information is needed.
>>>> Any help is appreciated! Thanks
>>>> Krzysiek
>>>>
>>>> 2015-09-30 16:23 GMT+02:00 Naganarasimha G R (Naga) <
>>>> garlanaganarasimha@huawei.com>:
>>>>
>>>>> Hi ,
>>>>>
>>>>> Whats the size of Store Files?
>>>>> Since when is it running ? how many applications have been run since
>>>>> it has been started ?
>>>>> Whats the value of "
>>>>> yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms" ?
>>>>>
>>>>> + Naga
>>>>> ------------------------------
>>>>> *From:* Krzysztof Zarzycki [k.zarzycki@gmail.com]
>>>>> *Sent:* Wednesday, September 30, 2015 19:20
>>>>> *To:* user@hadoop.apache.org
>>>>> *Subject:* YARN timelineserver process taking 600% CPU
>>>>>
>>>>> Hi there Hadoopers,
>>>>> I have a serious issue with my installation of Hadoop & YARN in
>>>>> version 2.7.1 (HDP 2.3).
>>>>> The timelineserver process ( more
>>>>> precisely org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>>>>> class) takes over 600% of CPU, generating enormous load on my master
node.
>>>>> I can't guess why it happens.
>>>>>
>>>>> First, I run the timelineserver using java 8, thought that this was an
>>>>> issue. But no, I started timelineserver now with use of java 7 and still
>>>>> the problem is the same.
>>>>>
>>>>> My cluster is tiny- it consists of:
>>>>> - 2 HDFS nodes
>>>>> - 2 HBase RegionServers
>>>>> - 2 Kafkas
>>>>> - 2 Spark nodes
>>>>> - 8 Spark Streaming jobs, processing around 100 messages/second TOTAL.
>>>>>
>>>>> I'll be very grateful for your help here. If you need any more info,
>>>>> please write.
>>>>> I also attach yarn-site.xml grepped to options related to timeline
>>>>> server.
>>>>>
>>>>> And here is a command of timeline that I see from ps :
>>>>> /usr/java/jdk1.7.0_79/bin/java -Dproc_timelineserver -Xmx1024m
>>>>> -Dhdp.version=2.3.0.0-2557 -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn
>>>>> -Dyarn.log.dir=/var/log/hadoop-yarn/yarn
>>>>> -Dhadoop.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>>> -Dyarn.log.file=yarn-yarn-timelineserver-hd-master-a01.log -Dyarn.home.dir=
>>>>> -Dyarn.id.str=yarn -Dhadoop.root.logger=INFO,EWMA,RFA
>>>>> -Dyarn.root.logger=INFO,EWMA,RFA
>>>>> -Djava.library.path=:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native
>>>>> -Dyarn.policy.file=hadoop-policy.xml
>>>>> -Dhadoop.log.dir=/var/log/hadoop-yarn/yarn
>>>>> -Dyarn.log.dir=/var/log/hadoop-yarn/yarn
>>>>> -Dhadoop.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>>> -Dyarn.log.file=yarn-yarn-timelineserver-hd-master-a01.log
>>>>> -Dyarn.home.dir=/usr/hdp/current/hadoop-yarn-timelineserver
>>>>> -Dhadoop.home.dir=/usr/hdp/2.3.0.0-2557/hadoop
>>>>> -Dhadoop.root.logger=INFO,EWMA,RFA -Dyarn.root.logger=INFO,EWMA,RFA
>>>>> -Djava.library.path=:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native:/usr/hdp/2.3.0.0-2557/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.3.0.0-2557/hadoop/lib/native
>>>>> -classpath
>>>>> /usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/conf:/usr/hdp/current/hadoop-client/conf:/usr/hdp/2.3.0.0-2557/hadoop/lib/*:/usr/hdp/2.3.0.0-2557/hadoop/.//*:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/./:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-hdfs/.//*:/usr/hdp/2.3.0.0-2557/hadoop-yarn/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-yarn/.//*:/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/lib/*:/usr/hdp/2.3.0.0-2557/hadoop-mapreduce/.//*:::/usr/share/java/mysql-connector-java.jar::/usr/share/java/mysql-connector-java.jar:/usr/hdp/current/hadoop-yarn-timelineserver/.//*:/usr/hdp/current/hadoop-yarn-timelineserver/lib/*:/usr/hdp/current/hadoop-client/conf/timelineserver-config/log4j.properties
>>>>> org.apache.hadoop.yarn.server.applicationhistoryservice.ApplicationHistoryServer
>>>>>
>>>>>
>>>>> Thanks!
>>>>> Krzysztof
>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message