hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-4418) HDFS-347: increase default FileInputStreamCache size
Date Thu, 17 Jan 2013 06:02:12 GMT

     [ https://issues.apache.org/jira/browse/HDFS-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Todd Lipcon updated HDFS-4418:
------------------------------

    Attachment: hdfs-4418.txt

Attached patch increases the cache size to 100 blocks (12GB file at default 128mb block size).

I also dropped the cache expiry interval to 5 seconds, so that unless the workload is truly
hitting a large number of blocks frequently, it won't accumulate a lot of file descriptors.
Given that caching the socket only matters when you're doing lots of short reads that hit
buffer cache, I think it's reasonable to have a short expiry period. For comparison, the keepalive
period for non-local reads is only 1sec.
                
> HDFS-347: increase default FileInputStreamCache size
> ----------------------------------------------------
>
>                 Key: HDFS-4418
>                 URL: https://issues.apache.org/jira/browse/HDFS-4418
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode, hdfs-client, performance
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-4418.txt
>
>
> The FileInputStreamCache currently defaults to holding only 10 input stream pairs (corresponding
to 10 blocks). In many HBase workloads, the region server will be issuing random reads against
a local file which is 2-4GB in size or even larger (hence 20+ blocks).
> Given that the memory usage for caching these input streams is low, and applications
like HBase tend to already increase their ulimit -n substantially (eg up to 32,000), I think
we should raise the default cache size to 50 or more. In the rare case that someone has an
application which uses local reads with hundreds of open blocks and can't feasibly raise their
ulimit -n, they can lower the limit appropriately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message