hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6699) Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)
Date Thu, 17 Jul 2014 10:58:05 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Remus Rusanu updated HDFS-6699:
-------------------------------

    Description: 
HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain sockets. Similar
capability can be introduced in a secure Windows environment using [DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block file and then duplicate
the handle into the hdfs client process and return to the process the handle value. The hdfs
client can then open a Java stream on this handle and read the file. This is a secure mechanism,
the HDFS acls are validated by the namenode and the process does not gets direct access to
the file in a controlled manner (eg. read-only). The hdfs client process does not need to
have OS level access privilege to the block file.

A complication arises from the requirement to duplicate the handle in the hdfs client process.
Ordinary processes (as we desire datanode to run) do not have the required privilege (SeDebugPrivilege).
But with introduction of an elevated service helper for the nodemanager Windows Secure Container
Executor (YARN-2198) we have at our disposal an elevated executor that can do the job of duplicating
the handle. The datanode would communicate with this process using the same mechanism as the
nodemanager, ie. LRPC.

With my proposed implementation the sequence of actions is as follows:

 - the hdfs client requests Windows secure shortcircuit of a block in the data transfer protocol.
It passes the block, the token and its own process ID.
 - datanode approves short-circuit. It opens the block file and obtains the handle.
 - datanode invokes the elevated privilege service to duplicate the handle into the hdfs client
process. datanode invokes the service LRPC interface over JNI (LRPC being the Windows de-facto
standard for interoperating with a service). It passes the handle valeu, its own process id
and the hdfs client process id. 
 - The elevated service duplicates the handle from the datanode process into the hdfs client
proces. It returns the duplicate handle value to the datanode as output value from the LRPC
call
 - x 2 for CRC file
 - the datanode responds to the short circuit datatransfer protocol request with a message
that contains the duplicate handle value (handles actually, x2 from CRC)
 - the hdfs-client creates a Java stream that wraps the handles and reads the block from this
stream (ditto for CRC)

datanode needs to exercise care not to duplicate the same handle to different clients (including
the CRC handles) because a handle abstracts also the file position and clients would inadvertently
move each other file pointer to chaos results.

TBD a mitigation for process ID reuse (the hdfs client can be terminated immediately after
the block request and a new process could reuse the same ID) . In theory an attacker could
use this as a mechanism to obtain a handle to a block by killing the hdfs-client at the right
moment and swing new processes until it gets one with the desired ID. I'm not sure is a realistic
threat because the attacker already must have the privilege to kill the hdfs client process,
and having such privilege he could obtain the handle by other means (eg. debug/inspect hdfs
client process). 

  was:
HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain sockets. Similar
capability can be introduced in a secure Windows environment using [DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block file and then duplicate
the handle into the hdfs client process and return to the process the handle value. The hdfs
client can then open a Java stream on this handle and read the file. This is a secure mechanism,
the HDFS acls are validated by the namenode and the process does not gets direct access to
the file in a controlled manner (eg. read-only). The hdfs client process does not need to
have OS level access privilege to the block file.

A complication arises from the requirement to duplicate the handle in the hdfs client process.
Ordinary processes (as we desire datanode to run) do not have the required privilege (SeDebugPrivilege).
But with introduction of an elevated service helper for the namenode Windows Secure Container
Executor (YARN-2198) we have at our disposal an elevated executor that can do the job of duplicating
the handle. the namenode would communicate with this process using the same mechanism as the
nodemanager, ie. LRPC.

With my proposed implementation the sequence of actions is as follows:

 - the hdfs client requests Windows secure shortcircuit of a block in the data transfer protocol.
It passes the block, the token and its own process ID.
 - datanode approves short-circuit. It opens the block file and obtains the handle.
 - datanode invokes the elevated privilege service to duplicate the handle into the hdfs client
process. datanode invokes the service LRPC interface over JNI (LRPC being the Windows de-facto
standard for interoperating with a service). It passes the handle valeu, its own process id
and the hdfs client process id. 
 - The elevated service duplicates the handle from the datanode process into the hdfs client
proces. It returns the duplicate handle value to the datanode as output value from the LRPC
call
 - x 2 for CRC file
 - the datanode responds to the short circuit datatransfer protocol request with a message
that contains the duplicate handle value (handles actually, x2 from CRC)
 - the hdfs-client creates a Java stream that wraps the handles and reads the block from this
stream (ditto for CRC)

datanode needs to exercise care not to duplicate the same handle to different clients (including
the CRC handles) because a handle abstracts also the file position and clients would inadvertently
move each other file pointer to chaos results.

TBD a mitigation for process ID reuse (the hdfs client can be terminated immediately after
the block request and a new process could reuse the same ID) . In theory an attacker could
use this as a mechanism to obtain a handle to a block by killing the hdfs-client at the right
moment and swing new processes until it gets one with the desired ID. I'm not sure is a realistic
threat because the attacker already must have the privilege to kill the hdfs client process,
and having such privilege he could obtain the handle by other means (eg. debug/inspect hdfs
client process). 


> Secure Windows DFS read when client co-located on nodes with data (short-circuit reads)
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-6699
>                 URL: https://issues.apache.org/jira/browse/HDFS-6699
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, hdfs-client, performance, security
>            Reporter: Remus Rusanu
>              Labels: windows
>
> HDFS-347 Introduced secure short-circuit HDFS reads based on linux domain sockets. Similar
capability can be introduced in a secure Windows environment using [DuplicateHandle](http://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx)
Win32 API. When short-circuit is allowed the datanode would open the block file and then duplicate
the handle into the hdfs client process and return to the process the handle value. The hdfs
client can then open a Java stream on this handle and read the file. This is a secure mechanism,
the HDFS acls are validated by the namenode and the process does not gets direct access to
the file in a controlled manner (eg. read-only). The hdfs client process does not need to
have OS level access privilege to the block file.
> A complication arises from the requirement to duplicate the handle in the hdfs client
process. Ordinary processes (as we desire datanode to run) do not have the required privilege
(SeDebugPrivilege). But with introduction of an elevated service helper for the nodemanager
Windows Secure Container Executor (YARN-2198) we have at our disposal an elevated executor
that can do the job of duplicating the handle. The datanode would communicate with this process
using the same mechanism as the nodemanager, ie. LRPC.
> With my proposed implementation the sequence of actions is as follows:
>  - the hdfs client requests Windows secure shortcircuit of a block in the data transfer
protocol. It passes the block, the token and its own process ID.
>  - datanode approves short-circuit. It opens the block file and obtains the handle.
>  - datanode invokes the elevated privilege service to duplicate the handle into the hdfs
client process. datanode invokes the service LRPC interface over JNI (LRPC being the Windows
de-facto standard for interoperating with a service). It passes the handle valeu, its own
process id and the hdfs client process id. 
>  - The elevated service duplicates the handle from the datanode process into the hdfs
client proces. It returns the duplicate handle value to the datanode as output value from
the LRPC call
>  - x 2 for CRC file
>  - the datanode responds to the short circuit datatransfer protocol request with a message
that contains the duplicate handle value (handles actually, x2 from CRC)
>  - the hdfs-client creates a Java stream that wraps the handles and reads the block from
this stream (ditto for CRC)
> datanode needs to exercise care not to duplicate the same handle to different clients
(including the CRC handles) because a handle abstracts also the file position and clients
would inadvertently move each other file pointer to chaos results.
> TBD a mitigation for process ID reuse (the hdfs client can be terminated immediately
after the block request and a new process could reuse the same ID) . In theory an attacker
could use this as a mechanism to obtain a handle to a block by killing the hdfs-client at
the right moment and swing new processes until it gets one with the desired ID. I'm not sure
is a realistic threat because the attacker already must have the privilege to kill the hdfs
client process, and having such privilege he could obtain the handle by other means (eg. debug/inspect
hdfs client process). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message