hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vanco Buca (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-14055) Over-eager allocation in ByteBufferUtil.fallbackRead
Date Thu, 08 Nov 2018 00:10:00 GMT
Vanco Buca created HDFS-14055:
---------------------------------

             Summary: Over-eager allocation in ByteBufferUtil.fallbackRead
                 Key: HDFS-14055
                 URL: https://issues.apache.org/jira/browse/HDFS-14055
             Project: Hadoop HDFS
          Issue Type: Improvement
          Components: fs
            Reporter: Vanco Buca


The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])] massively
overallocates memory when the underlying input stream returns data in smaller chunks. This
happens on a regular basis when using the S3 input stream as input.

The behavior is an O(N^2)-ish. In a recent debug session, we were trying to read 6MB, but
getting 16K at a time. The code would:
 * allocate 16M, use the first 16K
 * allocate 16M - 16K, use the first 16K of that
 * allocate 16M - 32K, use the first 16K of that
 * (etc)

The patch is simple. Here's the text version of the patch:
{code}
@@ -88,10 +88,17 @@ public final class ByteBufferUtil {
         buffer.flip();
       } else {
         buffer.clear();
-        int nRead = stream.read(buffer.array(),
-          buffer.arrayOffset(), maxLength);
-        if (nRead >= 0) {
-          buffer.limit(nRead);
+        int totalRead = 0;
+        while (totalRead < maxLength) {
+          final int nRead = stream.read(buffer.array(),
+            buffer.arrayOffset() + totalRead, maxLength - totalRead);
+          if (nRead <= 0) {
+            break;
+          }
+          totalRead += nRead;
+        }
+        if (totalRead >= 0) {
+          buffer.limit(totalRead);
           success = true;
         }
       }
{code}

so, essentially, do the same thing that the code in the direct memory path is doing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message