hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon" <t...@apache.org>
Subject Re: Review Request: ByteBuffer-based read API for DFSInputStream
Date Mon, 05 Mar 2012 21:57:31 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4184/#review5622
-----------------------------------------------------------

Ship it!


first pass looks good. A few questions:
- do you have before/after benchmarks for the old TestParallelRead with this patch? ie make
sure this patch doesn't regress performance of the "normal" array-based read path?
- have you run the various randomized tests overnight or so to get good coverage of all the
cases?

I didn't look carefully at the libhdfs code, since you said offline there's another rev of
that coming


hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12234>

    can you add javadoc explaining this var (even though it's not new)?



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12233>

    I'd move this inside the if statement below. Then in the second "phase", re-declare it
under that scope, since the "len" variable doesn't ever carry over between phases (and in
fact becomes invalid)



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12232>

    Seems like this area could be optimized a bit to avoid the object creation here...
    
    In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff).
    
    In the other case, you could probably save off the limit of slowReadBuff, lower the limit
to len, do the read, then set the limit again.



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12235>

    this code's very similar to line 358-362. Can we extract a utility function which we could
optimize as I mentioned above to avoid object allocation?



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12236>

    I think it might simplify things if you used this function only for the case where verifyChecksum
== true. Then above, the non-checksummed read case would just call fillBuffer directly. What
do you think?
    
    My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that
it always reads a multiple of checksum chunk size, which isn't really a requirement when not
verifying checksums



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12237>

    maybe TRACE is more appropriate here. Also LOG.trace() inside



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
<https://reviews.apache.org/r/4184/#comment12238>

    remove commented out code



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
<https://reviews.apache.org/r/4184/#comment12239>

    I think it's worth making these actual static inner classes with names instead of anonymous,
for prettier stack traces.
    
    I remember we discussed a few weeks back whether the creation of these lambdas would harm
performance for short reads. Do you have any results for TestParallelRead performance for
the lambda-based approach here vs the uglier approach



hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
<https://reviews.apache.org/r/4184/#comment12240>

    goofy empty clause. Other goofy indentation here.


- Todd


On 2012-03-05 21:00:55, Todd Lipcon wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4184/
> -----------------------------------------------------------
> 
> (Updated 2012-03-05 21:00:55)
> 
> 
> Review request for hadoop-hdfs and Todd Lipcon.
> 
> 
> Summary
> -------
> 
> Posting patch to RB on behalf of Henry.
> 
> 
> This addresses bug HDFS-2834.
>     http://issues.apache.org/jira/browse/HDFS-2834
> 
> 
> Diffs
> -----
> 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
dfab730 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
cc61697 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
2b817ff 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
b7da8d4 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
ea24777 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 
>   hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf 
>   hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
9d4f4a2 
>   hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
PRE-CREATION 
>   hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
bbd0012 
>   hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
PRE-CREATION 
>   hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
eb2a1d8 
> 
> Diff: https://reviews.apache.org/r/4184/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Todd
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message