hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream
Date Wed, 07 Mar 2012 03:04:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223927#comment-13223927
] 

jiraposter@reviews.apache.org commented on HDFS-2834:
-----------------------------------------------------



bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > some stuff around error cases here -- I think you'd run into these bugs if you hit
a checksum error, since the "retry on different node" path would trigger in DFSInputStream,
but the target buffer's position would be prematurely updated from prior attempts.
bq.  > 
bq.  > But getting close!
bq.  > 
bq.  > Is it possible to split out the test refactor change from this one? Hard to tell
what's changed in the tests vs just refactored. If it's a pain in the butt I'll look more
closely.

It's not super easy, unfortunately - let me know if it's a big problem and I'll perform the
git surgery required (or provide annotations on the review, if that would help).

Sounds like I need to write tests against a file with a malformed checksum; I'll post another
patch when I'm done. 

I'll respond to the libhdfs changes in a separate update. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
line 491
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88647#file88647line491>
bq.  >
bq.  >     do you really need synchronization here and below? it seems like the strategy
is being called from a synchronized context, so this is redundant. Removing this also allows
the inner classes to be static which I think is preferable

The inner classes depend on blockReader, which is why they're non-static. The synchronisation
is (hopefully) to calm Findbugs since it gives a false positive when blockReader is perhaps
accessed from a non-synchronized context (although the strategy classes are always called
from a method which already has the lock).

We can kill two birds with one stone by making the classes static and passing blockReader
as a parameter either at construction or read-time; findBugs probably won't follow the aliasing
and so won't try and understand the locks.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 268-270
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line268>
bq.  >
bq.  >     I envision some silly user setting the buffer size < bytesPerChecksum and
getting screwed here. Worth a check for valid range here (throwing IOE or IllegalArgumentException
or something)

Hah, I was just going to let them have 0-byte reads :) Yes, throwing is a better idea. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 123-133
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line123>
bq.  >
bq.  >     please use javadoc style comments for these, rather than //s

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java,
line 22
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88644#file88644line22>
bq.  >
bq.  >     unused import?

Good catch. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 357
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line357>
bq.  >
bq.  >     should this line go in a finally{} clause? Otherwise if 'to' doesn't have enough
space for the copy, we'd end up leaving 'from' with the modified limit

The callers are supposed to guarantee that 'to' has enough space, but it does no harm to be
defensive. 


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 397
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line397>
bq.  >
bq.  >     in finally{} clause perhaps, so the limit isn't messed up by a failed read

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 459
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line459>
bq.  >
bq.  >     the error conditions here don't quite work right.
bq.  >     For example, if a checksum error occurs, the buffer's position will still be
updated.
bq.  >     
bq.  >     Above, there's a similar problem if "phase 1" succeeds but "phase 2" encounters
an error -- the position will change but the method will throw.

I think it might make sense for the client of BlockReaderLocal to do the cleanup, since they
might have to do some reset logic of their own anyhow. In particular, DFSInputStream seeks
back to the beginning of the read. We might as well keep all the compensation logic in the
same place, and DFSInputStream is best placed to know what it wants to do. Plus it's a bit
easier to wrap a single read call in a try / catch block than to try and reason about individual
failure modes inside the read method. Does that sound sensible?


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 300-301
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line300>
bq.  >
bq.  >     This isn't your bug, but I just realized there's a potential leak here if the
skip() calls below failed. I think we need to add a try..finally{} which returns these buffers
to the pool if construction fails

Done.


bq.  On 2012-03-07 00:11:46, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 517-521
bq.  > <https://reviews.apache.org/r/4212/diff/1/?file=88645#file88645line517>
bq.  >
bq.  >     no '-'s needed in javadoc

Done.


- Henry


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4212/#review5665
-----------------------------------------------------------


On 2012-03-06 23:14:07, Henry Robinson wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4212/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-06 23:14:07)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  New patch for HDFS-2834 (I can't update the old review request).
bq.  
bq.  
bq.  This addresses bug HDFS-2834.
bq.      http://issues.apache.org/jira/browse/HDFS-2834
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
dfab730 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
cc61697 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
4187f1c 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
2b817ff 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
b7da8d4 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
ea24777 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
9d4f4a2 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
PRE-CREATION 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
bbd0012 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
PRE-CREATION 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
eb2a1d8 
bq.  
bq.  Diff: https://reviews.apache.org/r/4212/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Henry
bq.  
bq.


                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, HDFS-2834.4.patch,
HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.7.patch, HDFS-2834.8.patch, HDFS-2834.patch,
HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated {{byte[]}}.
Although for many clients this is desired behaviour, in certain situations, such as native-reads
through libhdfs, this imposes an extra copy penalty since the {{byte[]}} needs to be copied
out again into a natively readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own buffer,
wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message