Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 366C610F70 for ; Thu, 13 Mar 2014 19:17:55 +0000 (UTC) Received: (qmail 83113 invoked by uid 500); 13 Mar 2014 19:17:48 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 82993 invoked by uid 500); 13 Mar 2014 19:17:46 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 82895 invoked by uid 99); 13 Mar 2014 19:17:45 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 19:17:45 +0000 Date: Thu, 13 Mar 2014 19:17:45 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5791) Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13933916#comment-13933916 ] Chris Nauroth commented on MAPREDUCE-5791: ------------------------------------------ I think I found the root cause of the problem. The JDK does not really implement a zero-copy transfer on Windows. I checked the source code for OpenJDK 6, 7 and 8, and they all look like this in FileChannelImpl.c: {code} JNIEXPORT jlong JNICALL Java_sun_nio_ch_FileChannelImpl_transferTo0(JNIEnv *env, jobject this, jint srcFD, jlong position, jlong count, jint dstFD) { return IOS_UNSUPPORTED; } {code} On Linux, these functions delegate to the {{sendfile}} syscall. It's a shame that this isn't available in the Windows JDK, because it's theoretically possible to do a zero-copy transfer on Windows using {{TransmitFile}}: http://msdn.microsoft.com/en-us/library/windows/desktop/ms740565(v=vs.85).aspx I think it's fine to proceed with this buffer-copying patch, but I also wonder if we'd see even better performance if we could figure out a JNI call to {{TransmitFile}}. I'll review the patch in more detail later. From a quick glance, it looked like there were a few cases of indentation using 4 spaces instead of 2 spaces (the project standard). > Shuffle phase is slow in Windows - FadviseFileRegion::transferTo does not read disks efficiently > ------------------------------------------------------------------------------------------------ > > Key: MAPREDUCE-5791 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5791 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Nikola Vujic > Assignee: Nikola Vujic > Attachments: MAPREDUCE-5791.patch > > > transferTo method in org.apache.hadoop.mapred.FadvisedFileRegion is using transferTo method from a FileChannel to transfer data from a disk to socket. This is performing slow in Windows, slower than in Linux. The reason is that transferTo method for the java.nio is issuing 32K IO requests all the time. In Windows, these 32K transfers are not optimal and we don't get the best performance form the underlying IO subsystem. In order to achieve better performance when reading from the drives, we need to read data in bigger chunks, 512K for example. -- This message was sent by Atlassian JIRA (v6.2#6252)