pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Niels Basjes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4796) Authenticate with Kerberos using a keytab file
Date Mon, 29 Feb 2016 14:57:18 GMT

    [ https://issues.apache.org/jira/browse/PIG-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15171950#comment-15171950

Niels Basjes commented on PIG-4796:

I just did a full size test run using the following script on 10 days worth of click data.

Summary: Test passed on the Kerberos secured cluster I have here.

My input were 90 distinct logfiles totaling a few hundred GiB of gzipped apache access logfiles.
My kerberos account has been configured to have the tickets expire after 5 minutes and have
a max renew of 10 minutes (for me this is the easiest way to test this feature).

I ran this pig script with the following command line:
./bin/pig -P nbasjes.kerberos.properties -param_file LogFormats.properties ./useragent.pig{code}

So I made sure I was logged out of Kerberos and then i ran the script against a Kerberos secured
Even though the script lasted for over 27 minutes  the while thing ran successfully. 
I verified the output of this script and this was correct.

The script I ran (from the pig source directory):
{code}REGISTER ./contrib/piggybank/java/piggybank.jar ;
REGISTER ./lib/*.jar ;

UserAgents =
  USING org.apache.pig.piggybank.storage.apachelog.LogFormatLoader( '$LOGFORMAT',
        ) AS (

UserAgentsCount =
    FOREACH  UserAgents
    GENERATE useragent AS useragent:chararray,
             1L        AS clicks:long;

CountsPerUseragents =
    GROUP UserAgentsCount
    BY    (useragent);

SumsPerBrowser =
    FOREACH  CountsPerUseragents
    GENERATE SUM(UserAgentsCount.clicks) AS clicks,
             group                       AS useragent;

STORE SumsPerBrowser
    INTO  'TopUseragents'
    USING org.apache.pig.piggybank.storage.CSVExcelStorage('\t','NO_MULTILINE', 'UNIX');

[~daijy]: Is this the type of manual test you think is correct?

> Authenticate with Kerberos using a keytab file
> ----------------------------------------------
>                 Key: PIG-4796
>                 URL: https://issues.apache.org/jira/browse/PIG-4796
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.15.0
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>              Labels: feature, kerberos, security
>         Attachments: 2016-02-18-1510-PIG-4796.patch, 2016-02-18-PIG-4796-rough-proof-of-concept.patch,
> When running in a Kerberos secured environment users are faced with the limitation that
their jobs cannot run longer than the (remaining) ticket lifetime of their Kerberos tickets.
The environment I work in these tickets expire after 10 hours, thus limiting the maximum job
duration to at most 10 hours (which is a problem).
> In the Hadoop tooling there is a feature where you can authenticate using a Kerberos
keytab file (essentially a file that contains the encrypted form of the kerberos principal
and password). Using this the running application can request new tickets from the Kerberos
server when the initial tickets expire.
> In my Java/Hadoop applications I commonly include these two lines:
> {code}
> System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
> UserGroupInformation.loginUserFromKeytab("nbasjes@XXXXXX.NET", "/home/nbasjes/.krb/nbasjes.keytab");
> {code}
> This way I have run an Apache Flink based application for more than 170 hours (about
a week) on the kerberos secured Yarn cluster.
> What I propose is to have a feature that I can set the relevant kerberos values in my
pig script and from there be able to run a pig job for many days on the secured cluster.
> Proposal how this can look in a pig script:
> {code}
> SET java.security.krb5.conf '/etc/krb5.conf'
> SET job.security.krb5.principal 'nbasjes@XXXXXX.NET'
> SET job.security.krb5.keytab '/home/nbasjes/.krb/nbasjes.keytab'
> {code}
> So iff all of these are set (or at least the last two) then the aforementioned  UserGroupInformation.loginUserFromKeytab
method is called before submitting the job to the cluster.

This message was sent by Atlassian JIRA

View raw message