hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Mon, 14 Sep 2009 18:10:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12755120#action_12755120
] 

Jay Booth commented on HDFS-516:
--------------------------------

Hey Todd, in short, I agree, we should be looking at moving performance improvements over
to the main FS implementation.  Right now, my version doesn't support user permissions or
checksumming.  I'd say it makes sense to keep it in contrib as a sandbox for now, and work
towards full compatibility with the main DFS implementation at which point we could consider
swapping in the new reading subsystem?  User permissioning would require some model changes
but should be workable, checksumming probably won't be too bad if I read the code right.

So, I suppose keep it in contrib as a sandbox initially with an explicit goal of moving it
over to DFS when it reaches compatibility?  It doesn't really lend itself to moving over piecemeal,
as it has several components which all pretty much need each other.  However, it's pretty
well integrated with the DFS API and only replaces one method on the filesystem class.

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090912.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message