Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Message-ID: <1326154826.1249074854897.JavaMail.jira@brutus>
Date: Fri, 31 Jul 2009 14:14:14 -0700 (PDT)
From: "Jay Booth (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Subject: [jira] Commented: (HDFS-516) Low Latency distributed reads
In-Reply-To: <1395375878.1249073654783.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/HDFS-516?page=3Dcom.atlassian.j=
ira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D127377=
13#action_12737713 ]=20

Jay Booth commented on HDFS-516:
--------------------------------

Here's some architectural overview and a general request for comments on th=
e matter, I'll be away and busy the next few days but should be able to get=
 back to this in the middle of next week.

The basic workflow is I created a RadFileSystem (RandomAccessDistributed FS=
) which wraps DistributedFileSystem and delegates to it for everything exce=
pt for getFSDataInputStream.  That returns a custom FSDataInputStream which=
 wraps a CachingByteService which itself wraps a RadFSByteService.  The cac=
hing byte services share a cache which is managed by the RadFSClient class =
(could maybe factor that away and put it in RadFileSystem instead).  They t=
ry to hit the cache, and if they miss, they call the underlying RadFSClient=
ByteService to get the requested page plus a few pages of lookahead.  The R=
adFSClientByteService calls the namenode to get appropriate block locations=
 (todo, cache these effectively) and then calls RadNode, which is embedded =
in DataNode via ServicePlugin and maintains an IPCServer and a set of FileC=
hannels to the local blocks.  On repeated requests for the same data, the R=
adFSClient tends to favor going to the same host, figuring that the benefit=
 of hitting the DataNode's OS cache for the given bytes outweighs the penal=
ty of hopping a rack in terms of reducing latency (untested assumption). =
=20

The intended use case is pretty different from MapReduce so I think this sh=
ould be a contrib module that has to be explicitly invoked by clients.  It =
really underperforms DFS in terms of streaming but should (haven't tested e=
xtensively outside of localhost) significantly outperform it in terms of ra=
ndom reads.  In terms of files with 'hot paths', such as lucene indices or =
binary search over a normal file, cache hit percentage is likely to be pret=
ty high so it should probably perform pretty well.  Currently, it makes a f=
resh request to the NameNode for every read, which is inefficient but more =
likely to be correct.  Going forward, I'd like to tighten this up, make sur=
e it plays nice with append and get it into a future Hadoop release.  =20

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server s=
ide and simulated OS paging with LRU caching and lookahead on the client si=
de.  Some applications could include lucene searching (term->doc and doc->o=
ffset mappings are likely to be in local cache, thus much faster than nutch=
's current FsDirectory impl and binary search through record files (bytes a=
t 1/2, 1/4, 1/8 marks are likely to be cached)

--=20
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.