hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5810) Unify mmap cache and short-circuit file descriptor cache
Date Tue, 11 Feb 2014 02:04:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897447#comment-13897447

Colin Patrick McCabe commented on HDFS-5810:

munmap is going to be manipulating things in memory; mmap often has to hit disk.  That's why
the latter is more expensive.  Recent Linux kernels have more fine-grained locking in this
area, although I'm not an expert on that area of the kernel.  We can't do I/O while holding
a global client-side lock-- clients like HBase have on the order of 10k open files and we
don't want to block everyone.

bq. ClientContext#getFromConf, can we push the creation of a new DFSClient.Conf into #get
when it's necessary? Seems better to avoid doing all those hash lookups.

That method is really only for tests, where it's inconvenient to dig around to get a DFSClient.Conf.
 I will add a comment explaining that this is mostly for testing.  (I think JspHelper uses
it too.)

bq. We removed the javadoc parameter descriptions in a few places, some of which were helpful
(e.g. len of -1 means read as many bytes as possible). Could we add the one-line docs back
to the builder variables?

Good idea.  I added javadoc for the BlockReaderFactory members.

bq. Mind adding "dfs.client.cached.conn.retry" to hdfs-default.xml?


bq. cacheTries now counts down instead of counting up, so I think it needs a new name. cacheTriesRemaining
isn't great, but something like that.


bq. cacheTries used to also only tick when we got a stale peer out of the cache. Now, nextTcpPeer
and nextDomainPeer tick cacheTries unconditionally.

The effect is the same, since if we get a non-stale (i.e. usable) peer out of the cache, we're
done.  Centralizing it is a good idea since it avoids the kind of bugs we had in the past
where we forgot to handle certain kinds of retries correctly.

bq. Previously, we would disable domain sockets or throw an exception if we hit an error when
using a new Peer (domain or TCP respectively). Now, we don't know if a peer is cached or new,
and spin until we run out of cacheTries (which isn't really related here).

OK, that's fair.  That variable is supposed to be about how many times we'll try the *cache*,
not how many times we'll retry in general.  Fixed.

> Unify mmap cache and short-circuit file descriptor cache
> --------------------------------------------------------
>                 Key: HDFS-5810
>                 URL: https://issues.apache.org/jira/browse/HDFS-5810
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5810.001.patch, HDFS-5810.004.patch, HDFS-5810.006.patch, HDFS-5810.008.patch,
HDFS-5810.015.patch, HDFS-5810.016.patch, HDFS-5810.018.patch, HDFS-5810.019.patch
> We should unify the client mmap cache and the client file descriptor cache.  Since mmaps
are granted corresponding to file descriptors in the cache (currently FileInputStreamCache),
they have to be tracked together to do "smarter" things like HDFS-5182.

This message was sent by Atlassian JIRA

View raw message