hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kiran Kumar Kolli (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15947) Enhance Templeton service job operations reliability
Date Fri, 17 Feb 2017 08:50:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871453#comment-15871453
] 

Kiran Kumar Kolli commented on HIVE-15947:
------------------------------------------

BusyException.java: Default constructor to call new constructor with message. 
AppConfig.java: Init() changing the order of conf loading now. This might have impact on current
scenarios. Is this change must? 
Code repetition: Same pattern is used between, Submit, list & status, lets re-use the
code. 
Semaphore release in finally block: Finally block is not guaranteed to run when thread is
killed or interrupted (http://docs.oracle.com/javase/tutorial/essential/exceptions/finally.html)
and this might lead to starvation. Why not do catch-all and then release (careful with memory
pressure scenario). 
Troubleshooting: I guess Log4j supports tracing calling thread, otherwise explicit tracing
will help in troubleshooting.

Will review unit test later. 


> Enhance Templeton service job operations reliability
> ----------------------------------------------------
>
>                 Key: HIVE-15947
>                 URL: https://issues.apache.org/jira/browse/HIVE-15947
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Subramanyam Pattipaka
>            Assignee: Subramanyam Pattipaka
>         Attachments: HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation requests. It simply
accepts and tries to run all operations. If more number of concurrent job submit requests
comes then the time to submit job operations can increase significantly. Templetonused hdfs
to store staging file for job. If HDFS storage can't respond to large number of requests and
throttles then the job submission can take very large times in order of minutes.
> This behavior may not be suitable for all applications and client applications  may be
looking for predictable and low response for successful request or send throttle response
to client to wait for some time before re-requesting job operation.
> In this JIRA, I am trying to address following job operations 
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of cluster resources
like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs which controls
maximum number of concurrent active job submissions within Templeton and use this config to
control better response times. If a new job submission request sees that there are already
templeton.job.submit.exec.max-procs jobs getting submitted concurrently then the request will
fail with Http error 503 with reason 
>    β€œToo many concurrent job submission requests received. Please wait for some time
before retrying.”
>  
> The client is expected to catch this response and retry after waiting for some time.
The default value for the config templeton.job.submit.exec.max-procs is set to β€˜0’. This
means by default job submission requests are always accepted. The behavior needs to be enabled
based on requirements.
> We can have similar behavior for Status and List operations with configs templeton.job.status.exec.max-procs
and templeton.list.job.exec.max-procs respectively.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message