hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-516) Low Latency distributed reads
Date Thu, 01 Oct 2009 18:06:24 GMT

     [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Booth updated HDFS-516:
---------------------------

    Attachment: radfs.odp

Here's my presentation at HadoopWorld tomorrow if anyone's interested

Short version:
  still faster than HDFS for random reads
  my erstwhile-fast streaming was entirely because I wasn't checksumming - checksumming slows
things down a lot
  my checksumming implementation is ghetto - simply wrapped ChecksummingFileSystem around
the whole thing, so it's slow
  includes plan to implement PipeliningByteService as another configurable part of the current
ByteService chain -- would prefetch pages and checksum them in a separate thread

I'll get a new patch up soon as well as a github repo, don't have the code with my presently.
 It looks like with implementation of PipeliningByteService I'll be able to equal/surpass
HDFS for streaming while keeping the lead in random reads and generating a lower server-side
workload.  

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090912.patch, radfs.odp
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message