hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4953) enable HDFS local reads via mmap
Date Wed, 14 Aug 2013 02:01:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739145#comment-13739145
] 

Andrew Wang commented on HDFS-4953:
-----------------------------------

Overall looks really solid, nice work. I have mostly nitty stuff, only a few potential bugs.

I haven't deduped this with Brandon's comments, apologies.

General
- LOG.debug statements should be wrapped in {{LOG.isDebugEnabled}} checks, because of the
limitations of log4j

hdfs-default.xml
- Has some extra lines of java pasted in.
- For cache size, mention fds, virtual address space, and application working set, as hints
on how to size the cache properly.
- The timeout javadoc mentions that the timeout is a minimum, but unreferenced mmaps can be
evicted before the timeout when under cache pressure.

ZeroCopyCursor
- Let's beef up the class javadoc. We need some place that documents the big picture of ZCR
and how to use it, e.g. how to get and use the cursor properly, the purpose of the fallback
buffer and when it's used, the implications of the skip checksums and short read options.
Right now it's sort of there in the method javadocs, but it's hard to get a sense of how to
use it and how it all fits together. An example API usage snippet would be great.
- read() javadoc: EOF here refers to an EOF when reading a block, not EOF of the HDFS file.
Would prefer to see "end of block".

HdfsZeroCopyCursor
- Would like to see explicit setting of {{allowShortReads}} to false in the constructor for
clarity.

ZeroCopyUnavailableException
- serialVersionUID should be private

DFSClient
- Comment on the lifecycle of MMAP_MANAGER_FACTORY, it's shared among multiple DFSClients
which is why the refcount is important.
- Maybe rename {{put}} to {{unref}} or {{close}}? It's not actually "putting" in the data
structure sense, which is confusing.

ClientMmap
- let's not call people "bad programmers", just say "accidentally leaked references".
- {{unmap}}: add to javadoc that it should only be called if the manager has been closed,
or by the manager with the lock held.
{code}
    MappedByteBuffer map= 
{code}
Need a space before the "=".

ClientMmapManager
- In an offline discussion, I asked Colin about using Guava's cache instead of building up
this reference counting cache infrastructure. Colin explained that having a background CacheCleaner
thread is preferable for more control and being able to do explicit cleanup, which is nice
since mmaps use up a file descriptor.
- Let's add some javadoc on the lifecycle of cached mmaps, and mention why it's important
to cache them (performance). IIUC, it only evicts unreferenced ClientMmaps, and does this
either on cache pressure (when we do a fetch) or on a relatively long timescale in the background
via the CacheCleaner (15 minutes).
- I think {{fromConf}}-style factory methods are more normally called {{get}}, e.g. {{FileSystem.get}}.
- Why is the CacheCleaner executor using half the timeout for the delay and period? I'd think
the delay can be the timeout (or timeout+period), since nothing will expire naturally before
that. For the period, I guess we need some arbitrary staleness bound (and timeout/2 seems
reasonable), but this might be worth mentioning in hdfs-default.xml.
- The {{evictable}} javadoc mentions jittering by a nanosecond, but it's being keyed off of
{{Time.monotonicNow}} which is milliseconds. We might in fact want to key off of {{System.nanoTime}}
for fewer collisions.
- I think {{evictOne}} would be clearer if you used {{TreeSet#pollFirst}} rather than an iterator.
{code}
    Iterator<Entry<Long, ClientMmap>> iter =
                  evictable.entrySet().iterator(); 
{code}
This has 10 spaces, where elsewhere in the file you use a double-indent of 4.

BlockReaderLocal
- Remaining TODO for blocks bigger than 2GB, want to file a follow-on JIRA for this?
- {{readZeroCopy}} catches and re-sets the interrupted status, does something else check this
later?
- Is it worth re-trying the mmap after a {{CacheCleaner}} period in case some space has been
freed up in the cache?
- The clientMmap from the manager can be null if the cache is full. I don't see a check for
this case.

Tests
- Would like to see some tests for cache eviction behavior
- How about a Java test without a backing buffer?
- JNI test has some commented out fprintfs
                
> enable HDFS local reads via mmap
> --------------------------------
>
>                 Key: HDFS-4953
>                 URL: https://issues.apache.org/jira/browse/HDFS-4953
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>    Affects Versions: 2.3.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: benchmark.png, HDFS-4953.001.patch, HDFS-4953.002.patch, HDFS-4953.003.patch,
HDFS-4953.004.patch, HDFS-4953.005.patch, HDFS-4953.006.patch
>
>
> Currently, the short-circuit local read pathway allows HDFS clients to access files directly
without going through the DataNode.  However, all of these reads involve a copy at the operating
system level, since they rely on the read() / pread() / etc family of kernel interfaces.
> We would like to enable HDFS to read local files via mmap.  This would enable truly zero-copy
reads.
> In the initial implementation, zero-copy reads will only be performed when checksums
were disabled.  Later, we can use the DataNode's cache awareness to only perform zero-copy
reads when we know that checksum has already been verified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message