flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niels Basjes <Ni...@basjes.nl>
Subject Re: Running yarn-session a kerberos secured Yarn/HBase cluster.
Date Mon, 01 Aug 2016 09:54:50 GMT
Thanks for the pointers towards the work you are doing here.
I'll put up a patch for the jars and such in the next few days.

Niels Basjes

On Mon, Aug 1, 2016 at 11:46 AM, Stephan Ewen <sewen@apache.org> wrote:

> Thank you for the breakdown of the problem.
> Option (1) or (2) would be the way to go, currently.
> The problem that (3) does not support HBase is simply solvable by adding
> the HBase jars to the lib directory. In the future, this should be solved
> by the YARN re-architecturing:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
> For the renewal of Kerberos tokens for streaming jobs: There is WIP and a
> pull request to attach key tabs to a Flink job:
> https://github.com/apache/flink/pull/2275
> The problem that the YARN session is accessible by everyone is a bit more
> tricky. In the future, this should be solved by these two parts:
>   - With the YARN re-achitecturing, sessions are bound to individual
> users. It should be possible to launch the session out of a single
> YarnExecutionEnvironment and then submit multiple jobs against it.
>   - The over-the-wire encryption and authentication should make sure that
> no other user can send jobs to that session.
> Greetings,
> Stephan
> On Mon, Aug 1, 2016 at 9:47 AM, Niels Basjes <Niels@basjes.nl> wrote:
>> Hi,
>> I have the situation that I have a Kerberos secured Yarn/HBase
>> installation and I want to export data from a lot (~200) HBase tables to
>> files on HDFS.
>> I wrote a flink job that does this exactly the way I want it for a single
>> table.
>> Now in general I have a few possible approaches to do this for the 200
>> tables I am facing:
>> 1) Create a single job that reads the data from all of those tables and
>> writes them to the correct files.
>>     I expect that to be a monster that will hog the entire cluster
>> because of the large number of HBase regions.
>> 2) Run a job that does this for a single table and simply run that in a
>> loop.
>>     Essentially I would have a shellscript or 'main' that loops over all
>> tablenames and run a flink job for each of those.
>>     The downside of this is that it will start a new flink topology on
>> Yarn for each table.
>>     This has a startup overhead of something like 30 seconds for each
>> table that I would like to avoid.
>> 3) I start a single    yarn-session   and submit my job in there 200
>> times.
>>     That would solve most of the startup overhead yet this doesn't work.
>> If I start yarn-session then I see these two relevant lines in the output.
>> 2016-07-29 14:58:30,575 INFO  org.apache.flink.yarn.Utils
>>                   - Attempting to obtain Kerberos security token for HBase
>> 2016-07-29 14:58:30,576 INFO  org.apache.flink.yarn.Utils
>>                   - HBase is not available (not packaged with this
>> application): ClassNotFoundException :
>> "org.apache.hadoop.hbase.HBaseConfiguration".
>> As a consequence any flink job I submit cannot access HBase at all.
>> As an experiment I changed my yarn-session.sh script to include HBase on
>> the classpath. (If you want I can submit a Jira issue and a pull request)
>> Now the yarn-session does have HBase available and the jobs runs as
>> expected.
>> There are how ever two problems that remain:
>> 1) This yarnsession is accessible by everyone on the cluster and as a
>> consequence they can run jobs in there that can access all data I have
>> access to.
>> 2) The kerberos token will expire after a while and (just like with all
>> long running jobs) I would really like to have this to be a 'long lived'
>> thing.
>> As far as I know this is just the tip of the security ice berg and I
>> would like to know what the correct approach is to solve this.
>> Thanks.
>> --
>> Best regards / Met vriendelijke groeten,
>> Niels Basjes

Best regards / Met vriendelijke groeten,

Niels Basjes

View raw message