hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-516) Low Latency distributed reads
Date Sat, 12 Sep 2009 20:16:57 GMT

     [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Booth updated HDFS-516:
---------------------------

    Attachment: hdfs-516-20090912.patch

New patch:
* README file with instructions for eclipse and running with hadoop
* Javadoc and JUnit 4-style test cases, some new test cases
* Benchmarks for random reads, binary search, and streaming
* Illustrated 100% performance increase in streaming case, somehow, from 213 seconds to 112
seconds to stream 1GB from a remote HDFS file.  Reproduced a couple times, using 16MB of cache
with the lookahead mechanism.  I suspect it uses a lot more CPU than conventional streaming,
but still, that's a lot faster.
* No longer requires any change to HDFS code, module is now entirely in contrib
* Cleans up file handles better
* Handles remote disconnect better from the client side

What are people's thoughts on getting this into 0.21?  It shows a lot of promise as far as
performance but hasn't been tested on larger clusters, I'd be confident up to 200 nodes or
so and then I'd start getting nervous.

Given that it lives entirely in contrib and needs to be actively configured to turn it on,
could we include this for 0.21?  Anyone want to try running the benchmarks?

I'll run the benchmarks one last time tomorrow to sanity check the latest patch (changed a
couple things since the last time I ran in a cluster), then maybe we could consider committing?

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message