Mailing-List: contact dev-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Received-SPF: pass (athena.apache.org: domain of ewenstephan@gmail.com
 designates 209.85.223.182 as permitted sender)
MIME-Version: 1.0
Sender: ewenstephan@gmail.com
In-Reply-To: 
 <CAGr9p8C5iZmKw-f+3WDuW9zOTEE7qgVu0M6R673ui9N0N4_1xA@mail.gmail.com>
References: <54BBED51.9040202@apache.org>
	<CAGr9p8C5iZmKw-f+3WDuW9zOTEE7qgVu0M6R673ui9N0N4_1xA@mail.gmail.com>
Date: Sun, 18 Jan 2015 19:07:05 +0100
Message-ID: 
 <CANC1h_u78k4pgik7bJD0FERUwRPho9xs2D+30wb68OLgXKS+1A@mail.gmail.com>
Subject: 
 =?UTF-8?Q?Re=3A_Future_directions_for_Flink=E2=80=99s_YARN_support=3F?=
From: Stephan Ewen <sewen@apache.org>
To: "dev@flink.apache.org" <dev@flink.apache.org>
Content-Type: multipart/alternative; boundary=20cf3010ecbdc47d43050cf112f7

--20cf3010ecbdc47d43050cf112f7
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Concerning (3), we could go for something like Tez does. You can configure
a HDFS directory with the Tez jars, from where they are referenced. Then,
the "install" would be uploading the jars and setting the config value. We
could even add a script for that.

Stephan


On Sun, Jan 18, 2015 at 7:00 PM, Robert Metzger <rmetzger@apache.org> wrote=
:

> Hi Daniel,
>
> let me answer your questions:
> 1. Basically all features you are requesting are implemented in this pull
> request: https://github.com/apache/flink/pull/292 (Per Job YARN cluster &
> programmatical control of the cluster). Feel free to review the pull
> request. It is pending for more than one week now and hasn't gotten much
> feedback. Also, I would recommend you to base the work on security suppor=
t
> on that branch.
>
> 2. I agree that the whole configuration loading process is not nicely
> implemented. When I was working on this, I didn't understand all the
> features offered by Hadoop's Configuration object. I implemented the stuf=
f
> that complicated for making it as easy as possible for users to use Flink
> on YARN. As you can see in the code, it is trying different commonly used
> environment variables to detect the location of the configuration files.
> These config files are then used and respected by the YARN client (for
> example the default file system name).
> I'll have a look at the "yarn jar" command. One concern I have with this =
is
> that we have an additional requirement through this: We expect the user t=
o
> have the "yarn" binary in PATH. I know quite a few environments (for
> example some users in the Hortonworks Sandbox) which don't have "hadoop"
> and "yarn" in the PATH. The "yarn jar" command as well is accessing the
> environment variables required to locate the hadoop configuration. But I
> will carefully check if using the "yarn jar" command brings us an
> advantage.
>
> 3. I'm also not completely convinced that this is the right approach. Whe=
n
> I was implementing the first version of Flink on YARN, I though that
> deploying many small files to HDFS will cause some load on the NameNode a=
nd
> need some time. Right now, we have 146 jars in the lib/ directory. I
> haven't done a performance comparison but I guess its slower to upload 14=
6
> files to HDFS instead of 1. (it is not only uploading the files to HDFS,
> YARN also needs to download and "localize" them prior to allocating new
> containers).
> Also, when deploying Flink on YARN on Google Compute cloud, the google
> compute storage is configured by default ... and its quite slow. So this
> would probably lead to a bad user experience.
> I completely agree that we need an option for users to use a pre-installe=
d
> Flink sitting on HDFS or somewhere else in the cluster.
> There is another issue in this area in our project: I don't like that the
> "hadoop2" build of flink is producing two binary directories with almost
> the same content and layout. We could actually merge the whole YARN stuff
> into the regular hadoop2 build. Therefore, I would suggest to put one fli=
nk
> fat jar into the lib/ directory. This would also make shading of our
> dependencies much easier. I will start a separate discussion on that when=
 I
> have more time again. Right now, I have more pressing issues to solve.
>
> Regarding your changes in the "security" branch: I'm super happy that
> others are starting to work on the YARN client as well. The whole codebas=
e
> has grown over time and its certainly good to have more eyes looking at i=
t.
> The security features of YARN and Hadoop in general are something that I'=
ve
> avoided in the past, because its so difficult to properly test. But its
> something we certainly need to address.
>
> Best,
> Robert
>
>
>
>
>
> On Sun, Jan 18, 2015 at 6:28 PM, Daniel Warneke <warneke@apache.org>
> wrote:
>
> > Hi,
> >
> > I just pushed my first version of Flink supporting YARN environments wi=
th
> > security/Kerberos enabled [1]. While working with the current Flink
> > version, I was really impressed by how easy it is to deploy the softwar=
e
> on
> > a YARN cluster. However, there are a few things a stumbled upon and I
> would
> > be interested in your opinion:
> >
> > 1. Separation between YARN session and Flink job
> > Currently, we separate the Flink YARN session from the Flink jobs, i.e.=
 a
> > user first has to bring up the Flink cluster on YARN through a separate
> > command and can then submit an arbitrary number of jobs to this cluster=
.
> > Through this separation it is possible to submit individual jobs with a
> > really low latency, but it introduces two major problems: First, it is
> > currently impossible to programmatically launch a Flink YARN cluster,
> > submit a job, wait for its completion and then tear the cluster down
> again
> > (correct me if I=E2=80=99m wrong here) although this is actually a very=
 important
> > use case. Second, with the security enabled, all jobs are executed with
> the
> > security credentials of the user who launched the Flink cluster. This
> > causes massive authorization problems. Therefore, I would propose to mo=
ve
> > to a model where we launch one Flink cluster per job (or at least to ma=
ke
> > this a very prominent option).
> >
> > 2. Loading Hadoop configuration settings for Flink
> > In the current release, we use custom code to identify and load the
> > relevant Hadoop XML configuration files (e.g. core-site.xml,
> yarn-site.xml)
> > for the Flink YARN client. I found this mechanism to be quite fragile a=
s
> it
> > depends on certain environment variables to be set and assumes certain
> > configuration keys to be specified in certain files. For example, with
> > Hadoop security enabled, the Flink YARN client needs to know what kind =
of
> > authentication mechanisms HDFS expects for the data transfer. This
> setting
> > is usually specified in hdfs-site.xml. In the current Flink version, th=
e
> > YARN client ignores this file and hence cannot talk to HDFS when securi=
ty
> > is enabled.
> > As an alternative, I propose to launch the Flink cluster on YARN throug=
h
> > the =E2=80=9Cyarn jar=E2=80=9D command. With this command, you get the =
entire
> configuration
> > setup for free and no longer have to worry about names of configuration
> > files, configuration paths and environment variables.
> >
> > 3. The uberjar deployment model
> > In my opinion, the current Flink deployment model for YARN, with the on=
e
> > fat uberjar, is unnecessarily bulky. With the last release the Flink
> > uberjar has grown to over 100 MB in size, amounting to almost 400 MB of
> > class files when uncompressed. Many of the includes are not even
> necessary.
> > For example, when using the =E2=80=9Cyarn jar=E2=80=9D hook to deploy F=
link, all relevant
> > Hadoop libraries are added to the classpath anyway, so there is no need
> to
> > include them in the uberjar (unless you assume the client does not have=
 a
> > Hadoop environment installed). Personally, I would favor a more
> > fine-granular deployment model. Especially, when we move to a
> > one-job-per-session model, I think we should allow having Flink
> > preinstalled on the cluster nodes and not always require to redistribut=
e
> > the 100 MB uberjar to each and every node.
> >
> > Any thoughts on that?
> >
> > Best regards,
> >
> >     Daniel
> >
> > [1] https://github.com/warneke/flink/tree/security
> >
>

--20cf3010ecbdc47d43050cf112f7--