hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [hadoop] snvijaya commented on a change in pull request #2307: HADOOP-17250 Lot of short reads can be merged with readahead.
Date Wed, 16 Sep 2020 14:10:40 GMT

snvijaya commented on a change in pull request #2307:
URL: https://github.com/apache/hadoop/pull/2307#discussion_r489461371



##########
File path: hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsInputStream.java
##########
@@ -180,9 +205,13 @@ private int readOneBlock(final byte[] b, final int off, final int len)
throws IO
 
       // Enable readAhead when reading sequentially
       if (-1 == fCursorAfterLastRead || fCursorAfterLastRead == fCursor || b.length >=
bufferSize) {
+        LOG.debug("Sequential read with read ahead size of {}", bufferSize);
         bytesRead = readInternal(fCursor, buffer, 0, bufferSize, false);
       } else {
-        bytesRead = readInternal(fCursor, buffer, 0, b.length, true);
+        // Enabling read ahead for random reads as well to reduce number of remote calls.
+        int lengthWithReadAhead = Math.min(b.length + readAheadRange, bufferSize);
+        LOG.debug("Random read with read ahead size of {}", lengthWithReadAhead);
+        bytesRead = readInternal(fCursor, buffer, 0, lengthWithReadAhead, true);

Review comment:
       As with Parquet and ORC we have seen read patterns move from sequential to random and
vice versa. That being the case would it not be better to read ahead to bufferSize always
? Providing options to read to lower bytes like 64 KB can actually lead to more IOPs. From
our meeting yesterday too , one thing we all agree to was lower the IOPs better and also better
to read more than smaller size. 
   So let remove the config for readAheadRange and instead always readAhead for whats configured
for bufferSize.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message