hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Mon, 03 Aug 2009 23:37:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738680#action_12738680
] 

Raghu Angadi commented on HDFS-516:
-----------------------------------


Yes, client cache surely helps with accesses like binary search. Client cache certainly has
a place. 

It is fine and very useful for you continue your work without worrying too much about the
integration for now.

You might need add basic CRC support for better comparison with default DFS in benchmarks,
but probably not a must for first cut. Note that current RPC has LOT of buffer copying overhead
(both on client and server). Since the I/O benchmarks are not CPU bound it would not show
up but it is an important factor for production loads.

> [...]  So while I agree with the urge for simplicity, I feel like we need to make that
performance tradeoff clear. Otherwise, we could have a lot of very slow mapreduce jobs happening.
[...]

Of course, we would not let that happen.. I think, in longer term,  we will have streaming
access almost as good as now and random access with improved latency.. fairly transparent
to the user.

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message