hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Tue, 15 Sep 2009 18:30:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755627#action_12755627

Raghu Angadi commented on HDFS-516:

bq. somehow, from 213 seconds to 112 seconds to stream 1GB from a remote HDFS file.

This is 5MBps for HDFS and 9MBps for RadFS. Assuming 9MBps is probably 100Mbps network limit
(is it?), 5MBps is too low for any FS. Since both reads are from the same physical files,
this may not be hardware related. Could you check what is causing this delay? This might be
affecting other benchmarks as well. Checking netstat on the client while this read is going
on might help.

Regd reads in RAD fs, does client fetch 32KB each time (single RPC) or does it pipeline (multiple
requests for single client's stream)?

@Todd, I essentially see this as POC of what could/should be improved in HDFS for addressing
latency issues. Contrib makes sense, but I would not expect this to go to production in this
form and should be marked 'Experimental'. The benchmarks also help greatly in setting priorities
for features. I don't think this needs a branch since it does not touch core at all.  

> Low Latency distributed reads
> -----------------------------
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090912.patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message