spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Buzolin (JIRA)" <>
Subject [jira] [Commented] (SPARK-18085) Better History Server scalability for many / large applications
Date Mon, 05 Dec 2016 14:39:58 GMT


Dmitry Buzolin commented on SPARK-18085:

I would like add my observations after working with SHS:

1. The JSON format for logs storage is inefficient and redundant - about 70% of information
in logs are repeated key names. This reliance on JSON is a dead end (perhaps compression may
alleviate this at some extent) for such distributed architecture as Spark and it would be
great if this changed to normal O/S like logging or storing logs in a database.

2. The amount of logging in Spark is directly proportional to the number of tasks. I've seen
50+ GB log files sitting in HDFS. The design has to be more intelligent not to produce such
logs, as they slow down the UI, impact performance or REST API and can occupy lot of space
in HDFS.

3. The Spark REST API should be consistent with regards to log availability. Many times when
Spark application finishes and both Yarn and Spark report application as completed via calls
into top level endpoint - yet the log file is not available via Spark REST API and returns
"no such app" message when one queries executors or jobs details. This leaves one guessing
and waiting before query the status of the application.

> Better History Server scalability for many / large applications
> ---------------------------------------------------------------
>                 Key: SPARK-18085
>                 URL:
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>         Attachments: spark_hs_next_gen.pdf
> It's a known fact that the History Server currently has some annoying issues when serving
lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. I'll be attaching
a document shortly describing the issues and suggesting a path to how to solve them.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message