hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-516) Low Latency distributed reads
Date Tue, 01 Sep 2009 01:39:32 GMT

     [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Booth updated HDFS-516:
---------------------------

    Attachment: hdfs-516-20090831.patch

New patch, IPC server was too slow for IO operations (like 40 times slower than DFS without
caching) so I wrote a custom ByteServer that's streamlined to avoid object creation or byte
copying whenever possible and defaults to tcp nodelay.  Client connections pool using commons-pool.
 Uses static methods in hdfs.rad.ByteServiceProtocol for all serialization, faster than reflection.
 On the laptop in pseudodistributed, I'm seeing 5X faster than DFS for random searches.

Refactored a bunch on the client side, eliminated a few redundant classes, still need to make
lookahead happen via a separate thread in caching byteservice and tweak a couple things in
ByteServer for performance, then this thing will be pretty fast.  I'm gonna run some numbers
on EC2 tonight/tomorrow and see what I come up with.

Also, cleaned up unit tests to JUnit 4 and added some javadoc, probably missed a bunch of
places and could certainly expand on all of it.  Haven't added license to the header of every
file yet, license explicitly granted here, will get to that for next patch.  

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090824.patch, hdfs-516-20090831.patch, radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message