flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Running continuously on yarn with kerberos
Date Mon, 09 Nov 2015 15:48:19 GMT
Apparently I just had to wait a bit longer for the first run.
Now I'm able to package the project in about 7 minutes.

Current status: I am now able to access HBase from within Flink on a
Kerberos secured cluster.
Cleaning up the patch so I can submit it in a few days.

On Sat, Nov 7, 2015 at 10:01 PM, Stephan Ewen <sewen@apache.org> wrote:

> The single shading step on my machine (SSD, 10 GB RAM) takes about 45
> seconds. HDD may be significantly longer, but should really not be more
> than 10 minutes.
>
> Is your maven build always stuck in that stage (flink-dist) showing a long
> list of dependencies (saying including org.x.y, including com.foo.bar, ...)
> ?
>
>
> On Sat, Nov 7, 2015 at 9:57 PM, Sachin Goel <sachingoel0101@gmail.com>
> wrote:
>
>> Usually, if all the dependencies are being downloaded, i.e., on the first
>> build, it'll likely take 30-40 minutes. Subsequent builds might take 10
>> minutes approx. [I have the same PC configuration.]
>>
>> -- Sachin Goel
>> Computer Science, IIT Delhi
>> m. +91-9871457685
>>
>> On Sun, Nov 8, 2015 at 2:05 AM, Niels Basjes <Niels@basjes.nl> wrote:
>>
>>> How long should this take if you have HDD and about 8GB of RAM?
>>> Is that 10 minutes? 20?
>>>
>>> Niels
>>>
>>> On Sat, Nov 7, 2015 at 2:51 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> Hi Niels!
>>>>
>>>> Usually, you simply build the binaries by invoking "mvn -DskipTests
>>>> clean package" in the root flink directory. The resulting program should
be
>>>> in the "build-target" directory.
>>>>
>>>> If the program gets stuck, let us know where and what the last message
>>>> on the command line is.
>>>>
>>>> Please be aware that the final step of building the "flink-dist"
>>>> project may take a while, especially on systems with hard disks (as opposed
>>>> to SSDs) and a comparatively low amount of memory. The reason is that the
>>>> building of the final JAR file is quite expensive, because the system
>>>> re-packages certain libraries in order to avoid conflicts between different
>>>> versions.
>>>>
>>>> Stephan
>>>>
>>>>
>>>> On Sat, Nov 7, 2015 at 2:40 PM, Niels Basjes <niels@basj.es> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Excellent.
>>>>> What you can help me with are the commands to build the binary
>>>>> distribution from source.
>>>>> I tried it last Thursday and the build seemed to get stuck at some
>>>>> point (at the end of/just after building the dist module).
>>>>> I haven't been able to figure out why yet.
>>>>>
>>>>> Niels
>>>>> On 5 Nov 2015 14:57, "Maximilian Michels" <mxm@apache.org> wrote:
>>>>>
>>>>>> Thank you for looking into the problem, Niels. Let us know if you
>>>>>> need anything. We would be happy to merge a pull request once you
have
>>>>>> verified the fix.
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 1:38 PM, Niels Basjes <Niels@basjes.nl>
wrote:
>>>>>>
>>>>>>> I created https://issues.apache.org/jira/browse/FLINK-2977
>>>>>>>
>>>>>>> On Thu, Nov 5, 2015 at 12:25 PM, Robert Metzger <rmetzger@apache.org
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Hi Niels,
>>>>>>>> thank you for analyzing the issue so properly. I agree with
you. It
>>>>>>>> seems that HDFS and HBase are using their own tokes which
we need to
>>>>>>>> transfer from the client to the YARN containers. We should
be able to port
>>>>>>>> the fix from Spark (which they got from Storm) into our YARN
client.
>>>>>>>> I think we would add this in org.apache.flink.yarn.Utils#
>>>>>>>> setTokensFor().
>>>>>>>>
>>>>>>>> Do you want to implement and verify the fix yourself? If
you are to
>>>>>>>> busy at the moment, we can also discuss how we share the
work (I'm
>>>>>>>> implementing it, you test the fix)
>>>>>>>>
>>>>>>>>
>>>>>>>> Robert
>>>>>>>>
>>>>>>>> On Tue, Nov 3, 2015 at 5:26 PM, Niels Basjes <Niels@basjes.nl>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Update on the status so far.... I suspect I found a problem
in a
>>>>>>>>> secure setup.
>>>>>>>>>
>>>>>>>>> I have created a very simple Flink topology consisting
of a
>>>>>>>>> streaming Source (the outputs the timestamp a few times
per second) and a
>>>>>>>>> Sink (that puts that timestamp into a single record in
HBase).
>>>>>>>>> Running this on a non-secure Yarn cluster works fine.
>>>>>>>>>
>>>>>>>>> To run it on a secured Yarn cluster my main routine now
looks like
>>>>>>>>> this:
>>>>>>>>>
>>>>>>>>> public static void main(String[] args) throws Exception
{
>>>>>>>>>     System.setProperty("java.security.krb5.conf", "/etc/krb5.conf");
>>>>>>>>>     UserGroupInformation.loginUserFromKeytab("nbasjes@xxxxxx.NET",
"/home/nbasjes/.krb/nbasjes.keytab");
>>>>>>>>>
>>>>>>>>>     final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
>>>>>>>>>     env.setParallelism(1);
>>>>>>>>>
>>>>>>>>>     DataStream<String> stream = env.addSource(new
TimerTicksSource());
>>>>>>>>>     stream.addSink(new SetHBaseRowSink());
>>>>>>>>>     env.execute("Long running Flink application");
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> When I run this
>>>>>>>>>      flink run -m yarn-cluster -yn 1 -yjm 1024 -ytm 4096
>>>>>>>>> ./kerberos-1.0-SNAPSHOT.jar
>>>>>>>>>
>>>>>>>>> I see after the startup messages:
>>>>>>>>>
>>>>>>>>> 17:13:24,466 INFO  org.apache.hadoop.security.UserGroupInformation
>>>>>>>>>               - Login successful for user nbasjes@xxxxxx.NET
>>>>>>>>> using keytab file /home/nbasjes/.krb/nbasjes.keytab
>>>>>>>>> 11/03/2015 17:13:25 Job execution switched to status
RUNNING.
>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1)
switched to
>>>>>>>>> SCHEDULED
>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1)
switched to
>>>>>>>>> DEPLOYING
>>>>>>>>> 11/03/2015 17:13:25 Custom Source -> Stream Sink(1/1)
switched to
>>>>>>>>> RUNNING
>>>>>>>>>
>>>>>>>>> Which looks good.
>>>>>>>>>
>>>>>>>>> However ... no data goes into HBase.
>>>>>>>>> After some digging I found this error in the task managers
log:
>>>>>>>>>
>>>>>>>>> 17:13:42,677 WARN  org.apache.hadoop.hbase.ipc.RpcClient
                        - Exception encountered while connecting to the server : javax.security.sasl.SaslException:
GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)]
>>>>>>>>> 17:13:42,677 FATAL org.apache.hadoop.hbase.ipc.RpcClient
                        - SASL authentication failed. The most likely cause is missing or
invalid credentials. Consider 'kinit'.
>>>>>>>>> javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any
Kerberos tgt)]
>>>>>>>>> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
>>>>>>>>> 	at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:177)
>>>>>>>>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:815)
>>>>>>>>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$800(RpcClient.java:349)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First starting a yarn-session and then loading my job
gives the
>>>>>>>>> same error.
>>>>>>>>>
>>>>>>>>> My best guess at this point is that Flink needs the same
fix as
>>>>>>>>> described here:
>>>>>>>>>
>>>>>>>>> https://issues.apache.org/jira/browse/SPARK-6918   (
>>>>>>>>> https://github.com/apache/spark/pull/5586 )
>>>>>>>>>
>>>>>>>>> What do you guys think?
>>>>>>>>>
>>>>>>>>> Niels Basjes
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 27, 2015 at 6:12 PM, Maximilian Michels <
>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Niels,
>>>>>>>>>>
>>>>>>>>>> You're welcome. Some more information on how this
would be
>>>>>>>>>> configured:
>>>>>>>>>>
>>>>>>>>>> In the kdc.conf, there are two variables:
>>>>>>>>>>
>>>>>>>>>>         max_life = 2h 0m 0s
>>>>>>>>>>         max_renewable_life = 7d 0h 0m 0s
>>>>>>>>>>
>>>>>>>>>> max_life is the maximum life of the current ticket.
However, it
>>>>>>>>>> may be renewed up to a time span of max_renewable_life
from the first
>>>>>>>>>> ticket issue on. This means that from the first ticket
issue, new tickets
>>>>>>>>>> may be requested for one week. Each renewed ticket
has a life time of
>>>>>>>>>> max_life (2 hours in this case).
>>>>>>>>>>
>>>>>>>>>> Please let us know about any difficulties with long-running
>>>>>>>>>> streaming application and Kerberos.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>> Max
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 27, 2015 at 2:46 PM, Niels Basjes <Niels@basjes.nl>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Thanks for your feedback.
>>>>>>>>>>> So I guess I'll have to talk to the security
guys about having
>>>>>>>>>>> special
>>>>>>>>>>> kerberos ticket expiry times for these types
of jobs.
>>>>>>>>>>>
>>>>>>>>>>> Niels Basjes
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Oct 23, 2015 at 11:45 AM, Maximilian
Michels <
>>>>>>>>>>> mxm@apache.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Niels,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you for your question. Flink relies
entirely on the
>>>>>>>>>>>> Kerberos
>>>>>>>>>>>> support of Hadoop. So your question could
also be rephrased to
>>>>>>>>>>>> "Does
>>>>>>>>>>>> Hadoop support long-term authentication using
Kerberos?". And
>>>>>>>>>>>> the
>>>>>>>>>>>> answer is: Yes!
>>>>>>>>>>>>
>>>>>>>>>>>> While Hadoop uses Kerberos tickets to authenticate
users with
>>>>>>>>>>>> services
>>>>>>>>>>>> initially, the authentication process continues
differently
>>>>>>>>>>>> afterwards. Instead of saving the ticket
to authenticate on a
>>>>>>>>>>>> later
>>>>>>>>>>>> access, Hadoop creates its own security tockens
>>>>>>>>>>>> (DelegationToken) that
>>>>>>>>>>>> it passes around. These are authenticated
to Kerberos
>>>>>>>>>>>> periodically. To
>>>>>>>>>>>> my knowledge, the tokens have a life span
identical to the
>>>>>>>>>>>> Kerberos
>>>>>>>>>>>> ticket maximum life span. So be sure to set
the maximum life
>>>>>>>>>>>> span very
>>>>>>>>>>>> high for long streaming jobs. The renewal
time, on the other
>>>>>>>>>>>> hand, is
>>>>>>>>>>>> not important because Hadoop abstracts this
away using its own
>>>>>>>>>>>> security tockens.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm afraid there is not Kerberos how-to yet.
If you are on
>>>>>>>>>>>> Yarn, then
>>>>>>>>>>>> it is sufficient to authenticate the client
with Kerberos. On a
>>>>>>>>>>>> Flink
>>>>>>>>>>>> standalone cluster you need to ensure that,
initially, all
>>>>>>>>>>>> nodes are
>>>>>>>>>>>> authenticated with Kerberos using the kinit
tool.
>>>>>>>>>>>>
>>>>>>>>>>>> Feel free to ask if you have more questions
and let us know
>>>>>>>>>>>> about any
>>>>>>>>>>>> difficulties.
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>> Max
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Oct 22, 2015 at 2:06 PM, Niels Basjes
<Niels@basjes.nl>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> > Hi,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I want to write a long running (i.e.
never stop it) streaming
>>>>>>>>>>>> flink
>>>>>>>>>>>> > application on a kerberos secured Hadoop/Yarn
cluster. My
>>>>>>>>>>>> application needs
>>>>>>>>>>>> > to do things with files on HDFS and
HBase tables on that
>>>>>>>>>>>> cluster so having
>>>>>>>>>>>> > the correct kerberos tickets is very
important. The stream is
>>>>>>>>>>>> to be ingested
>>>>>>>>>>>> > from Kafka.
>>>>>>>>>>>> >
>>>>>>>>>>>> > One of the things with Kerberos is that
the tickets expire
>>>>>>>>>>>> after a
>>>>>>>>>>>> > predetermined time. My knowledge about
kerberos is very
>>>>>>>>>>>> limited so I hope
>>>>>>>>>>>> > you guys can help me.
>>>>>>>>>>>> >
>>>>>>>>>>>> > My question is actually quite simple:
Is there an howto
>>>>>>>>>>>> somewhere on how to
>>>>>>>>>>>> > correctly run a long running flink application
with kerberos
>>>>>>>>>>>> that includes a
>>>>>>>>>>>> > solution for the kerberos ticket timeout
 ?
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks
>>>>>>>>>>>> >
>>>>>>>>>>>> > Niels Basjes
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>>>>>
>>>>>>>>>>> Niels Basjes
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>>>
>>>>>>>>> Niels Basjes
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards / Met vriendelijke groeten,
>>>>>>>
>>>>>>> Niels Basjes
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>>
>


-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Mime
View raw message