pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4796) Authenticate with Kerberos using a keytab file
Date Sat, 20 Feb 2016 22:59:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155797#comment-15155797
] 

Niels Basjes commented on PIG-4796:
-----------------------------------

I did try several places to do the login and found that the connection to the cluster is created
very early in the process, even before parsing the script. 
I'll check if putting it in the HExecutionEngine.init is still early enough to make it work.

Note that the last patch already avoids (most) needless relogins by checking the status of
 {{UserGroupInformation.getLoginUser().hasKerberosCredentials())}}

To be able to test this I asked our Ops guys to drop the ticket lifetime on my account to
5 minutes (10 minutes renew). Yesterday I did a {{kdestroy}} and then ran wordcount on a 3GB
gzipped text file (which took > 30 minutes) and succeeded on the Kerberos secured cluster
we have at work. So this first test shows at least the basics of this solution seem to be
right. 

I need advice on how to approach the test for this patch. 
How should I create a test for this?

> Authenticate with Kerberos using a keytab file
> ----------------------------------------------
>
>                 Key: PIG-4796
>                 URL: https://issues.apache.org/jira/browse/PIG-4796
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>         Attachments: 2016-02-18-1510-PIG-4796.patch, 2016-02-18-PIG-4796-rough-proof-of-concept.patch
>
>
> When running in a Kerberos secured environment users are faced with the limitation that
their jobs cannot run longer than the (remaining) ticket lifetime of their Kerberos tickets.
The environment I work in these tickets expire after 10 hours, thus limiting the maximum job
duration to at most 10 hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a Kerberos
keytab file (essentially a file that contains the encrypted form of the kerberos principal
and password). Using this the running application can request new tickets from the Kerberos
server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbasjes@XXXXXX.NET", "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours (about
a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos values in my
pig script and from there be able to run a pig job for many days on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbasjes@XXXXXX.NET'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the aforementioned  UserGroupInformation.loginUserFromKeytab
method is called before submitting the job to the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message