hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xiaoyu Yao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7291) Persist in-memory replicas using unbuffered IO should only applies to supported Linux version
Date Mon, 27 Oct 2014 02:36:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184768#comment-14184768

Xiaoyu Yao commented on HDFS-7291:

FileChannel#transferTo only avoids a *Java NIO buffer* between reader/writer as described
in JDK document below. What we want to avoid is introducing in-memory replicas into *OS buffer*
during lazy persist as these blocks already occupy memory from RAM_DISK.

"This method is potentially much more efficient than a simple loop that reads from the source
channel and writes to this channel. Many operating systems can transfer bytes directly from
the source channel into the filesystem cache without actually copying them."

FileChannel#transferTo has no control over the OS buffer cache behavior.  Actually, the one
we use before and now as a fallback is apache.common.io.FileUtils#copyFile(), which used a
similar Java NIO API FileChannel#transferFrom. Based on our observations, it churns the OS
buffer a lot during the lazy persist. Only native API sendfile() in Linux 2.6.33+ and CopyFileEx
in Windows can direct OS buffer cache behavior as we desired. 

> Persist in-memory replicas using unbuffered IO should only applies to supported Linux
> ---------------------------------------------------------------------------------------------
>                 Key: HDFS-7291
>                 URL: https://issues.apache.org/jira/browse/HDFS-7291
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.6.0
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>         Attachments: HDFS-7291.0.patch
> HDFS-7090 changes to persist in-memory replicas using unbuffered IO on Linux and Windows.
On Linux distribution, it relies on the sendfile() API between two file descriptors to achieve
unbuffered IO copy. According to Linux document at http://man7.org/linux/man-pages/man2/sendfile.2.html,
this is only supported on Linux kernel 2.6.33+.  This JIRA is to limit the usage of sendfile()
for lazy persist only on Linux distribution with kernel version higher than 2.6.33. For unsupported
version, lazy persist will fallback to normal buffered IO.

This message was sent by Atlassian JIRA

View raw message