Mailing-List: contact issues-help@spark.apache.org; run by ezmlm
Precedence: bulk
Date: Wed, 26 Oct 2016 16:43:58 +0000 (UTC)
From: "Marcelo Vanzin (JIRA)" <jira@apache.org>
To: issues@spark.apache.org
Message-ID: <JIRA.13014927.1477351921000.91075.1477500238602@Atlassian.JIRA>
In-Reply-To: <JIRA.13014927.1477351921000@Atlassian.JIRA>
References: <JIRA.13014927.1477351921000@Atlassian.JIRA> <JIRA.13014927.1477351921876@arcas>
Subject: [jira] [Commented] (SPARK-18085) Scalability enhancements for the
 History Server
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 26 Oct 2016 16:44:00 -0000


    [ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15608968#comment-15608968 ] 

Marcelo Vanzin commented on SPARK-18085:
----------------------------------------

bq. this really sounds very much like the Hadoop ATS V1.5. Have you looked at that?

I haven't looked at 1.5 exactly, but I assume it's not much different from v1. The problem with the ATS is twofold: it's an external YARN dependency, and its based on "immutable events"; once you write a piece of data to the ATS you can't update it. My spec requires the ability to update objects stored in the underlying store.

I haven't looked closely at what will be done with v2, but the blurbs I looked at a while ago didn't really impress me much; the idea of running a separate JVM side-by-side with the AM sounded weird to me. Also, it still has the YARN dependency which may not be desired by a bunch of Spark users.

bq. I assume this a separate levelDB that stores just the metadata to do the simple listing on startup?

Yes, there are n+1 dbs kept by the SHS (listing + 1 per UI).

bq. Just a quick overall picture of what I think is being proposed without all the incremental steps and leaving out UI parts

That's pretty accurate. As far as cleanup policy, currently the code just cleans up based on the existing clean up policy; I've been thinking about adding a second cleaning thread to clean up the local data based on files that were deleted from HDFS outside of the SHS, but haven't gotten to that yet.


bq. How is this solving quickly listing new apps issue?

It's not, at least not explicitly. Incremental parsing would be a way to handle that, and also hooking up with HDFS's "inotify" API to detect new files or files being renamed. Or writing a separate "summary" file somewhere as you suggest. But those enhancements can be done separately from this work, really (which is why I kept SPARK-6951 a separate issue).

bq. streaming data, I'm not sure if streaming stores history at this point? 

No, streaming doesn't write to the event log, so there's no streaming UI in the SHS. This work touches streaming because I'm changing the backing store for UI data, but I don't want to tackle streaming history at this point in time. That can be done separately (and this work might help make it a more viable idea).

I also don't want to stray too far from the UI / SHS enhancements here. For example, breaking up large event log files (to make logging streaming events more palatable) would be nice, but in a sense it's orthogonal to what's being proposed here.

Finally, note that even though I don't explicitly call this out in the document, the proposal here is less about where the data will be stored and more about changing the underlying architecture to allow the data to be stored in a different place. If you take a look at the M1 code, there's an abstraction for an external store, and I just happen to have a LevelDB implementation. You could potentially implement an in-memory version, or an ATS version, or something else crazy. But the main thing I want to change is changing the idea that UI data is stored in memory, which is the source of most of the SHS issues we see.

> Scalability enhancements for the History Server
> -----------------------------------------------
>
>                 Key: SPARK-18085
>                 URL: https://issues.apache.org/jira/browse/SPARK-18085
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Spark Core, Web UI
>    Affects Versions: 2.0.0
>            Reporter: Marcelo Vanzin
>         Attachments: spark_hs_next_gen.pdf
>
>
> It's a known fact that the History Server currently has some annoying issues when serving lots of applications, and when serving large applications.
> I'm filing this umbrella to track work related to addressing those issues. I'll be attaching a document shortly describing the issues and suggesting a path to how to solve them.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org