mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan-Philip Gehrcke (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-6951) Docker containerizer: mangled environment when env value contains LF byte
Date Thu, 19 Jan 2017 02:19:26 GMT

     [ https://issues.apache.org/jira/browse/MESOS-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jan-Philip Gehrcke updated MESOS-6951:
--------------------------------------
    Description: 
Consider this Marathon app definition:

{code}
{
  "id": "/testapp",
  "cmd": "env && tail -f /dev/null",
  "env":{
    "TESTVAR":"line1\nline2"
  },
  "cpus": 0.1,
  "mem": 10,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "alpine"
    }
  }
}
{code}

The JSON-encoded newline in the value of the {{TESTVAR}} environment variable leads to a corrupted
task environment. What follows is a subset of the resulting task environment (as printed via
{{env}}, i.e. in key=value notation):

{code}
line2=
TESTVAR=line1
{code}

That is, the trailing part of the intended value ended up being interpreted as variable name,
and only the leading part of the intended value was used as actual value for {{TESTVAR}}.

Common application scenarios that would badly break with that involve pretty-printed JSON
documents or YAML documents passed along via the environment.

Following the code and information flow led to the conclusion that Docker's {{--env-file}}
command line interface is the weak point in the flow. It is currently used in Mesos' Docker
containerizer for passing the environment to the container:

{code}
  argv.push_back("--env-file");
  argv.push_back(environmentFile);
{code}

(Ref: [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])


Docker's {{--env-file}} argument behavior is documented via

{quote}
The --env-file flag takes a filename as an argument
and expects each line to be in the VAR=VAL format,
{quote}
(Ref: https://docs.docker.com/engine/reference/commandline/run/)

That is, Docker identifies individual environment variable key/value pair definitions based
on newline bytes in that file which explains the observed environment variable value fragmentation.
Notably, Docker does not provide a mechanism for escaping newline bytes in the values specified
in this environment file.

I think it is important to understand that Docker's {{--env-file}} mechanism is ill-posed
in the sense that it is not capable of transmitting the whole range of environment variable
values allowed by POSIX. That's what the Single UNIX Specification, Version 3 has to say about
environment variable values:

{quote}
the value shall be composed of characters from the
portable character set (except NUL and as indicated below). 
{quote}
(Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)

About "The portable character set": http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3

It includes (among others) the LF byte. Understandably, the current Docker {{--env-file}}
behavior will not change, so this is not an issue that can be deferred to Docker: https://github.com/docker/docker/issues/12997

Notably, the {{--env-file}} method for communicating environment variables to Docker containers
was just recently introduced to Mesos as of https://issues.apache.org/jira/browse/MESOS-6566,
for not leaking secrets through the process listing. Previously, we specified env key/value
pairs on the command line which leaked secrets to the process list and probably also did not
support the full range of valid environment variable values.

We need a solution that
1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
2) allows for passing arbitrary environment variable values.

It seems that Docker's {{--env}} method can be used for that. It can be used to define _just
the names of the environment variables_ to-be-passed-along, in which case the docker binary
will read the corresponding values from its own environment, which we can clearly prepare
appropriately when we invoke the corresponding child process. This method would still leak
environment variable _names_ to the process listing, but (especially if documented) this should
be fine.

  was:
Consider this Marathon app definition

{code}
{
  "id": "/testapp",
  "cmd": "env && tail -f /dev/null",
  "env":{
    "TESTVAR":"line1\nline2"
  },
  "cpus": 0.1,
  "mem": 10,
  "instances": 1,
  "container": {
    "type": "DOCKER",
    "docker": {
      "image": "alpine"
    }
  }
}
{code}

The JSON-encoded newline in the value of the {{TESTVAR}} environment variable leads to a corrupted
task environment. What follows is a subset of the resulting task environment (as printed via
{{env}}, i.e. in key=value notation):

{code}
line2=
TESTVAR=line1
{code}

That is, the trailing part of the intended value ended up being interpreted as variable name,
and only the leading part of the intended value was used as actual value for {{TESTVAR}}.

Common application scenarios that would badly break with that involve pretty-printed JSON
documents or YAML documents passed along via the environment.

Following the code and information flow led to the conclusion that Docker's {{--env-file}}
command line interface is the weak point in the flow. It is currently used in Mesos' Docker
containerizer for passing the environment to the container:

{code}
  argv.push_back("--env-file");
  argv.push_back(environmentFile);
{code}

(Ref: [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])


Docker's {{--env-file}} argument behavior is documented via

{quote}
The --env-file flag takes a filename as an argument
and expects each line to be in the VAR=VAL format,
{quote}
(Ref: https://docs.docker.com/engine/reference/commandline/run/)

That is, Docker identifies individual environment variable key/value pair definitions based
on newline bytes in that file which explains the observed environment variable value fragmentation.
Notably, Docker does not provide a mechanism for escaping newline bytes in the values specified
in this environment file.

I think it is important to understand that Docker's {{--env-file}} mechanism is ill-posed
in the sense that it is not capable of transmitting the whole range of environment variable
values allowed by POSIX. That's what the Single UNIX Specification, Version 3 has to say about
environment variable values:

{quote}
the value shall be composed of characters from the
portable character set (except NUL and as indicated below). 
{quote}
(Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)

About "The portable character set": http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3

It includes (among others) the LF byte. Understandably, the current Docker {{--env-file}}
behavior will not change, so this is not an issue that can be deferred to Docker: https://github.com/docker/docker/issues/12997

Notably, the {{--env-file}} method for communicating environment variables to Docker containers
was just recently introduced to Mesos as of https://issues.apache.org/jira/browse/MESOS-6566,
for not leaking secrets through the process listing. Previously, we specified env key/value
pairs on the command line which leaked secrets to the process list and probably also did not
support the full range of valid environment variable values.

We need a solution that
1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
2) allows for passing arbitrary environment variable values.

It seems that Docker's {{--env}} method can be used for that. It can be used to define _just
the names of the environment variables_ to-be-passed-along, in which case the docker binary
will read the corresponding values from its own environment, which we can clearly prepare
appropriately when we invoke the corresponding child process. This method would still leak
environment variable _names_ to the process listing, but (especially if documented) this should
be fine.


> Docker containerizer: mangled environment when env value contains LF byte
> -------------------------------------------------------------------------
>
>                 Key: MESOS-6951
>                 URL: https://issues.apache.org/jira/browse/MESOS-6951
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Jan-Philip Gehrcke
>
> Consider this Marathon app definition:
> {code}
> {
>   "id": "/testapp",
>   "cmd": "env && tail -f /dev/null",
>   "env":{
>     "TESTVAR":"line1\nline2"
>   },
>   "cpus": 0.1,
>   "mem": 10,
>   "instances": 1,
>   "container": {
>     "type": "DOCKER",
>     "docker": {
>       "image": "alpine"
>     }
>   }
> }
> {code}
> The JSON-encoded newline in the value of the {{TESTVAR}} environment variable leads to
a corrupted task environment. What follows is a subset of the resulting task environment (as
printed via {{env}}, i.e. in key=value notation):
> {code}
> line2=
> TESTVAR=line1
> {code}
> That is, the trailing part of the intended value ended up being interpreted as variable
name, and only the leading part of the intended value was used as actual value for {{TESTVAR}}.
> Common application scenarios that would badly break with that involve pretty-printed
JSON documents or YAML documents passed along via the environment.
> Following the code and information flow led to the conclusion that Docker's {{--env-file}}
command line interface is the weak point in the flow. It is currently used in Mesos' Docker
containerizer for passing the environment to the container:
> {code}
>   argv.push_back("--env-file");
>   argv.push_back(environmentFile);
> {code}
> (Ref: [code|https://github.com/apache/mesos/blob/c0aee8cc10b1d1f4b2db5ff12b771372fdd5b1f3/src/docker/docker.cpp#L584])
> Docker's {{--env-file}} argument behavior is documented via
> {quote}
> The --env-file flag takes a filename as an argument
> and expects each line to be in the VAR=VAL format,
> {quote}
> (Ref: https://docs.docker.com/engine/reference/commandline/run/)
> That is, Docker identifies individual environment variable key/value pair definitions
based on newline bytes in that file which explains the observed environment variable value
fragmentation. Notably, Docker does not provide a mechanism for escaping newline bytes in
the values specified in this environment file.
> I think it is important to understand that Docker's {{--env-file}} mechanism is ill-posed
in the sense that it is not capable of transmitting the whole range of environment variable
values allowed by POSIX. That's what the Single UNIX Specification, Version 3 has to say about
environment variable values:
> {quote}
> the value shall be composed of characters from the
> portable character set (except NUL and as indicated below). 
> {quote}
> (Ref: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap08.html)
> About "The portable character set": http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap06.html#tagtcjh_3
> It includes (among others) the LF byte. Understandably, the current Docker {{--env-file}}
behavior will not change, so this is not an issue that can be deferred to Docker: https://github.com/docker/docker/issues/12997
> Notably, the {{--env-file}} method for communicating environment variables to Docker
containers was just recently introduced to Mesos as of https://issues.apache.org/jira/browse/MESOS-6566,
for not leaking secrets through the process listing. Previously, we specified env key/value
pairs on the command line which leaked secrets to the process list and probably also did not
support the full range of valid environment variable values.
> We need a solution that
> 1) does not leak sensitive values (i.e. is compliant with MESOS-6566).
> 2) allows for passing arbitrary environment variable values.
> It seems that Docker's {{--env}} method can be used for that. It can be used to define
_just the names of the environment variables_ to-be-passed-along, in which case the docker
binary will read the corresponding values from its own environment, which we can clearly prepare
appropriately when we invoke the corresponding child process. This method would still leak
environment variable _names_ to the process listing, but (especially if documented) this should
be fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message