spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Williams (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SPARK-4539) History Server counts "incomplete" applications against the "retainedApplications" total, fails to show eligible "completed" applications
Date Fri, 21 Nov 2014 16:33:34 GMT
Ryan Williams created SPARK-4539:
------------------------------------

             Summary: History Server counts "incomplete" applications against the "retainedApplications"
total, fails to show eligible "completed" applications
                 Key: SPARK-4539
                 URL: https://issues.apache.org/jira/browse/SPARK-4539
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.2.0
            Reporter: Ryan Williams


I have observed the history server to return 0 or 1 applications from a directory that contains
many complete and incomplete applications (the latter being application directories that are
missing the {{APPLICATION_COMPLETE}} file).

Without having dug too much, my theory is that HistoryServer is seeing the "incomplete" directories
and counting them against the {{retainedApplications}} maximum but not displaying them.

One supporting anecdote for this is that I loaded HS against a directory that had one complete
application and nothing else, and HS worked as expected (I saw the one application in the
web UI).

I then copied ~100 other application directories in, the majority of which were "incomplete"
(in particular, most of the ones that had the earliest timestamps), and still only saw the
one original completed application via the web UI.

Finally, I restarted the same server with the {{retainedApplications}} set to 1000 (instead
of 50; the directory a this point had ~10 completed applications and 90 incomplete ones),
and saw all/exactly the completed applications, leading me to believe that they were being
"boxed out" of the maximum-50-retained-applications iteration of the history server.

Silently failing on "incomplete" directories while still docking the count, if that is indeed
what is happening, is a pretty confusing failure mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message