pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4796) Authenticate with Kerberos using a keytab file
Date Tue, 23 Feb 2016 09:41:18 GMT

     [ https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Niels Basjes updated PIG-4796:
    Attachment: PIG-4796-2016-02-23.patch

Cleaned version of the patch that has been updated to do the call in the HExecutionEngine.init.
This also includes a safeguard to only login once. 

This patch also includes javadoc and a section for the website on how to use this. As discussed
with [~daijy] there are no unit tests included, the documentation I added should suffice in
understanding how to test this manually.

If done correctly you should be able to do a {{kdestroy}} (i.e. remove all kerberos tickets)
and then run the script as indicated. When I do this I see near the top of the console output
messages like this:
{code}2016-02-23 09:26:40,229 [main] INFO  org.apache.pig.backend.hadoop.HKerberos - Trying
login using Kerberos Keytab
2016-02-23 09:26:40,229 [main] INFO  org.apache.pig.backend.hadoop.HKerberos - krb5: Conf
     = /etc/krb5.conf
2016-02-23 09:26:40,229 [main] INFO  org.apache.pig.backend.hadoop.HKerberos - krb5: Principal
= nbasjes@XXXXXX.NET
2016-02-23 09:26:40,229 [main] INFO  org.apache.pig.backend.hadoop.HKerberos - krb5: Keytab
   = /home/nbasjes/.krb/nbasjes.keytab
2016-02-23 09:26:40,423 [main] INFO  org.apache.hadoop.security.UserGroupInformation - Login
successful for user nbasjes@XXXXXX.NET using keytab file /home/nbasjes/.krb/nbasjes.keytab
and after this the job runs successfully on a secured cluster.
If you run the script for longer than the ticket duration the Hadoop cluster will use the
provided information to acquire new tickets from the kerberos system.

> Authenticate with Kerberos using a keytab file
> ----------------------------------------------
>                 Key: PIG-4796
>                 URL: https://issues.apache.org/jira/browse/PIG-4796
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>         Attachments: 2016-02-18-1510-PIG-4796.patch, 2016-02-18-PIG-4796-rough-proof-of-concept.patch,
> When running in a Kerberos secured environment users are faced with the limitation that
their jobs cannot run longer than the (remaining) ticket lifetime of their Kerberos tickets.
The environment I work in these tickets expire after 10 hours, thus limiting the maximum job
duration to at most 10 hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a Kerberos
keytab file (essentially a file that contains the encrypted form of the kerberos principal
and password). Using this the running application can request new tickets from the Kerberos
server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbasjes@XXXXXX.NET", "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours (about
a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos values in my
pig script and from there be able to run a pig job for many days on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbasjes@XXXXXX.NET'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the aforementioned  UserGroupInformation.loginUserFromKeytab
method is called before submitting the job to the cluster.

This message was sent by Atlassian JIRA

View raw message