Mailing-List: contact issues-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Fri, 17 Feb 2017 00:43:41 +0000 (UTC)
From: "Subramanyam Pattipaka (JIRA)" <jira@apache.org>
To: issues@hive.apache.org
Message-ID: <JIRA.13043729.1487273433000.105114.1487292221619@Atlassian.JIRA>
In-Reply-To: <JIRA.13043729.1487273433000@Atlassian.JIRA>
References: <JIRA.13043729.1487273433000@Atlassian.JIRA> <JIRA.13043729.1487273433913@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HIVE-15947) Enhance Templeton service job
 operations reliability
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Fri, 17 Feb 2017 00:43:46 -0000


    [ https://issues.apache.org/jira/browse/HIVE-15947?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1587=
0954#comment-15870954 ]=20

Subramanyam Pattipaka commented on HIVE-15947:
----------------------------------------------

[~thejas], please review these changes and let me know if you have any comm=
ents.

cc: [~ashitg], [~kiran.kolli]

> Enhance Templeton service job operations reliability
> ----------------------------------------------------
>
>                 Key: HIVE-15947
>                 URL: https://issues.apache.org/jira/browse/HIVE-15947
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Subramanyam Pattipaka
>            Assignee: Subramanyam Pattipaka
>         Attachments: HIVE-15947.patch
>
>
> Currently Templeton service doesn't restrict number of job operation requ=
ests. It simply accepts and tries to run all operations. If more number of =
concurrent job submit requests comes then the time to submit job operations=
 can increase significantly. Templetonused hdfs to store staging file for j=
ob. If HDFS storage can't respond to large number of requests and throttles=
 then the job submission can take very large times in order of minutes.
> This behavior may not be suitable for all applications and client applica=
tions  may be looking for predictable and low response for successful reque=
st or send throttle response to client to wait for some time before re-requ=
esting job operation.
> In this JIRA, I am trying to address following job operations=20
> 1) Submit new Job
> 2) Get Job Status
> 3) List jobs
> These three operations has different complexity due to variance in use of=
 cluster resources like YARN/HDFS.
> The idea is to introduce a new config templeton.job.submit.exec.max-procs=
 which controls maximum number of concurrent active job submissions within =
Templeton and use this config to control better response times. If a new jo=
b submission request sees that there are already templeton.job.submit.exec.=
max-procs jobs getting submitted concurrently then the request will fail wi=
th Http error 503 with reason=20
>    =E2=80=9CToo many concurrent job submission requests received. Please =
wait for some time before retrying.=E2=80=9D
> =20
> The client is expected to catch this response and retry after waiting for=
 some time. The default value for the config templeton.job.submit.exec.max-=
procs is set to =E2=80=980=E2=80=99. This means by default job submission r=
equests are always accepted. The behavior needs to be enabled based on requ=
irements.
> We can have similar behavior for Status and List operations with configs =
templeton.job.status.exec.max-procs and templeton.list.job.exec.max-procs r=
espectively.


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)