hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruslan Dautkhanov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-7219) Random mappers start delay to have a slow processing ramp-up
Date Sun, 16 Jun 2019 18:48:00 GMT
Ruslan Dautkhanov created MAPREDUCE-7219:
--------------------------------------------

             Summary: Random mappers start delay to have a slow processing ramp-up
                 Key: MAPREDUCE-7219
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7219
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
            Reporter: Ruslan Dautkhanov


Would be great to have a way to configure a random mappers start delay to have a slow/graceful
ramp-up of processing and avoid bloating an external system during initialization storm when
mappers at their startup have to talk to an external (non as scalable system) - a backend
database, ZK, DNS etc..

 

>From answer to SO question 

[https://stackoverflow.com/a/56621673/470583]

 

// quote

You could limit number of initializations at the same time manually using Apache Curator's
org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 mechanism for example

See for example how Cloudera uses this in batch-load jobs to load data to Solr -

[https://github.com/cloudera/search/blob/cdh6.2.0/search-crunch/src/main/java/org/apache/solr/crunch/MorphlineInitRateLimiter.java#L115]

in that particular example they use it to limit number of ZooKeeper initializations that can
be at the same time, to avoid bloating ZooKeeper with a storm of requests from hundreds of
mappers.

In one job I use 400 mappers, but only limit number of initializations to to 30 at the same
time (once the initializations are doen, mappers run fully independent).

In your example you want to limit number of requests to Oracle backend from mappers, in this
example they want to limit number of requests to ZK. So it's the same problem.

Ideally it would be great if Hadoop had a way to put a random delay for mappers ramp-up for
exact same reason. 

// quote

 

Instead of using org.apache.curator.framework.recipes.locks.InterProcessSemaphoreV2 a much
more generic solution would be to have a way to have a way to enforce random mappers delay
start (with configurable upper limit, and if it's not specified, there will be no limit). 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message