spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-20435) More thorough redaction of sensitive information from logs/UI, more unit tests
Date Sat, 22 Apr 2017 00:40:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-20435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15979651#comment-15979651
] 

Marcelo Vanzin commented on SPARK-20435:
----------------------------------------

{noformat}
"sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=
{noformat}

If someone is typing passwords in the process's command line, they have bigger problems than
the password showing up in the logs... (a.k.a. "ps ax")

> More thorough redaction of sensitive information from logs/UI, more unit tests
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-20435
>                 URL: https://issues.apache.org/jira/browse/SPARK-20435
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Mark Grover
>
> SPARK-18535 and SPARK-19720 were works to redact sensitive information (e.g. hadoop credential
provider password, AWS access/secret keys) from event logs + YARN logs + UI and from the console
output, respectively.
> While some unit tests were added along with these changes - they asserted when a sensitive
key was found, that redaction took place for that key. They didn't assert globally that when
running a full-fledged Spark app (whether or YARN or locally), that sensitive information
was not present in any of the logs or UI. Such a test would also prevent regressions from
happening in the future if someone unknowingly adds extra logging that publishes out sensitive
information to disk or UI.
> Consequently, it was found that in some Java configurations, sensitive information was
still being leaked in the event logs under the {{SparkListenerEnvironmentUpdate}} event, like
so:
> {code}
> "sun.java.command":"org.apache.spark.deploy.SparkSubmit ... --conf spark.executorEnv.HADOOP_CREDSTORE_PASSWORD=secret_password
...
> {code}
> "secret_password" should have been redacted.
> Moreover, previously redaction logic was only checking if the key matched the secret
regex pattern, it'd redact it's value. That worked for most cases. However, in the above case,
the key (sun.java.command) doesn't tell much, so the value needs to be searched. So the check
needs to be expanded to match against values as well.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message