hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Mon, 03 Aug 2009 20:31:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738565#action_12738565

Raghu Angadi commented on HDFS-516:

Jay, random read is an (increasingly more) important feature for HDFS to support. Currently
latency is the biggest draw back. See HDFS-236. It is good to see your work on this. You could
also run simple benchmark in HDFS-236 that does simple random read on a file and does not
depend on a sequence file.

>From your architecture description this reduces the latency through following improvements
   * Connection caching (Through RPC).
   * File Channel  caching on Server
   * Local cache on the client.

These are complementary to existing datanode. I might be a lot more simpler to add these features
to existing implementation rather than requiring a user to choose an implementation based
on the access. As such you will have to re-implement many features (BlockLocations on client,
CRC verification, effcient bulk transfers AVRO-24, etc )

> Low Latency distributed reads
> -----------------------------
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message