hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-516) Low Latency distributed reads
Date Wed, 02 Sep 2009 03:06:32 GMT

    [ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12750246#action_12750246
] 

Jay Booth commented on HDFS-516:
--------------------------------

I did some benchmarking, here are the results:

Each test ran 1000 searches to warm, then 5000 searches to benchmark.
Binary search of a 20GB sorted sequence file of 20 million 1kb records.
Tests were run from the namenode in a 4-node EC2 medium cluster, 1.7 GB of ram each.  1 namenode
and 3 datanodes.  

>From HDFS to a 512MB cached RadFS there was a 4X average improvement in search times,
from 102ms to 24ms.
Each search was, theoretically, 24.25 reads (log 2 of 20 million).  Not actually measured.
I only ran each set once.  The 90th percent line trends the right way, although the max line
is a little spikey.  I'll add a 99th % in future benchmarks.

HDFS, baseline:
Warming with 1000 searches
Executed 5000 random searches with FS class org.apache.hadoop.hdfs.DistributedFileSystem
Done, Search Times:
Mean:     102.17840000000015
Variance: 5939.660105461091
Median:   97.0
Max:      3095.0
Min:      33.0
90th pct: 130.0

Rad, no cache
Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem
Done, Search Times: 
Mean:     68.55640000000002
Variance: 233.8335857571515
Median:   67.0
Max:      379.0
Min:      26.0
90th pct: 79.0

Rad, 16MB cache:
Warming with 1000 searches
Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem
Done, Search Times: 
Mean:     42.039799999999985
Variance: 237.83818359671966
Median:   40.0
Max:      203.0
Min:      5.0
90th pct: 59.0

Rad, 128MB cache:
Warming with 1000 searches
Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem
Done, Search Times: 
Mean:     29.850600000000007
Variance: 202.08189601920367
Median:   27.0
Max:      203.0
Min:      1.0
90th pct: 45.0

Rad, 512MB cache:
Warming with 1000 searches
Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem
Done, Search Times:
Mean:     24.274600000000014
Variance: 250.3052558911758
Median:   22.0
Max:      687.0
Min:      0.0
90th pct: 36.0


I could still shave a point or two by cleaning up my caching system to be more graceful with
its lookahead mechanism, but not bad for now.  I'll pretty it up and post a first attempt
at a final patch soon.

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090824.patch, hdfs-516-20090831.patch, radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side and simulated
OS paging with LRU caching and lookahead on the client side.  Some applications could include
lucene searching (term->doc and doc->offset mappings are likely to be in local cache,
thus much faster than nutch's current FsDirectory impl and binary search through record files
(bytes at 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message