hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Mon, 03 Aug 2009 18:37:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738488#action_12738488

Jay Booth commented on HDFS-516:

    *  The content of TestSequenceFileSearcher.java is commented out. Should this file be
included in the patch at all?
** Yeah, I need to get that working and said so in the initial comment -- I think that once
it's working, it would make sense to include it in the patch as a useful piece of functionality
that doubles as an example use-case, but I'll certainly defer to the community's sensibilities
on that.

    * There's a push to upgrade existing test base to Junit4 (see HADOOP-4901 for more information).
I see that the tests are developed in old JUnit3 fashion (TestCase extension, lack of @Test
notation, junit.framework.* packages, etc.) Although JUnit4 will pickup these tests without
any problems, I'd suggest that new tests are compliant with JUnit4.
** Will do.  I actually deliberately made them 3-like because that looked like the existing
convention, but if we're trying to change that, I completely agree.

    * a number of classes have a lack of JavaDocs even for public classes/methods
** Doh, yep, need to fix that for sure.  Just wanted to post the JIRA for feedback before
I was unable to work on it this week, apologies.

    * the content of SequenceFileSearcher.java is commented out. Is it need to be a part of
this patch at all?
** Same as above on its testcase -- it doesn't need to be part of the patch but once I get
it working, it would be a useful tool

    * pieces of the code here and there (e.g. ConcurrentLRUCache.java) are commented out.
Should they be just removed from the patch?
** Yeah, any commented out code in the patch is the result of my oversight, will absolutely
fix at my first opportunity.  ConcurrentLRUCache was stolen from Solr so it's those guys'
fault ;)  I'll clean out the commented code though and while I'm at it, I'll port over their
unit test for it.

    * some classes don't have standard Apache license boiler plate (ByteService.java, ByteServiceLazyInitializer.java,
and many other)
** Will add

    * FSDatasetInterface is being changed: shall this information be added to the release
** Yeah, it's a pretty minor change that I noted in my initial post, should I make a README.txt
in contrib/radfs?  Alternatively, I could work with FSDataset directly instead of FSDatasetInterface
(FSDataset exposes getFile(), the interface doesn't), which would mean zero lines of changed
code in HDFS core at the expense of some tighter coupling.  Willing to go either way at the
discretion of the community.

Thanks a ton for the feedback, if anyone else has anything at all I'd love to hear it before
I get back to working on this later in the week.  

> Low Latency distributed reads
> -----------------------------
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message