hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3936) Clients should not enforce counter limits
Date Tue, 09 Oct 2012 19:26:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472659#comment-13472659
] 

Robert Joseph Evans commented on MAPREDUCE-3936:
------------------------------------------------

MAPREDUCE-3061 is only a concept right now.  The JIRA was created over a year ago, and the
only update since was someone asking for more clarification about the requirements to which
no one responded.  I don't really want to wait for a JIRA that is likely to be off in the
future to fix a very real problem that we have right now.

Additionally I don't see splitting the history server into two independent parts as being
something that will solve this problem.  It could help, and any changes we make should ideally
have this split in mind, but it will not just solve the issue.  The issue is how much data
can the history server cache in memory vs. leave in HDFS and reconstruct on demand. And what
is the granularity of that caching.  Right now the caching is happening on a per job basis,
which is way too large.

We could fix this by not caching at all. Every time a page is loaded, a web service call is
made, or an RPC call comes in we parse the job history log and reconstruct just the data for
that request and nothing else.  On some very large jobs(50,000+ tasks) I have seen parsing
the log take 10 seconds so this would have a negative impact on page load times.  Also what
kind of extra load would we be placing on HDFS doing this every time? It really depends on
how used the history server becomes.

The final solution really has to be some middle ground where we can cache a known quantity
of data, and then reconstruct everything else on demand as needed.  This is a lot of work,
and so in the short term I would prefer to see something that allows the history server to
not crash with an OOM, but will still provide most of the needed functionality until something
better can be written.

I know that the History Server can easily get OOMs when loading large jobs with lots of tasks,
which is a far bigger concern to me then the counters are right now, simply because the AM
still tries to enforce the counter limits.
                
> Clients should not enforce counter limits 
> ------------------------------------------
>
>                 Key: MAPREDUCE-3936
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3936
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Tom White
>            Assignee: Tom White
>         Attachments: MAPREDUCE-3936.patch, MAPREDUCE-3936.patch
>
>
> The code for enforcing counter limits (from MAPREDUCE-1943) creates a static JobConf
instance to load the limits, which may throw an exception if the client limit is set to be
lower than the limit on the cluster (perhaps because the cluster limit was raised from the
default).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message