hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2246) Shortcut a local client reads to a Datanodes files directly
Date Mon, 14 Nov 2011 22:17:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149998#comment-13149998

Todd Lipcon commented on HDFS-2246:

+  // Cache of block to file path where the block is stored
+  //private static Map<ExtendedBlock, BlockLocalPathInfo> cache = null;
remove dead code

in {{LocalBlockReader.newBlockReader}}, if the checksum file open fails, then the {{dataIn}}
stream will be leaked. You need a try..finally to close these if the open wasn't successful


why does LocalBlockReader extend RemoteBlockReader2? It should be its own implementation of
BlockReader - it doesn't really share much code with its superclass (looks like only {{skip}},
which is trivial anyway). In terms of member variables, it uses some, but ignores the rest.


The cached proxies to the local DN currently don't have the UGI as part of the key. So if
a client needs to impersonate different users, the cache will end up caching client proxies
associated with previous users, which would be incorrect.

Same is true of different timeouts, right?


The implementation of {{read}} is fairly inefficient - reading and verifying checksum on one
chunk at a time does a poor job of amortizing JNI/syscall overhead. Should probably read 64KB
or 128KB at a time. You can address this in a followup.


+            checksum.verifyChunkedSums(dataBuff, checksumBuff, filename, off);
Wrong offset is passed here - you should be passing the position of dataIn (the block offset),
not the offset into the target byte array


The implementation of {{read}} looks incorrect to me. If you read less than one checksum chunk,
and you want to verify checksums then it will always read an entire checksum chunk, so the
next read will have incorrect results.


In DFSClient, it looks very inefficient to be calling {{isLocalAddress}} without any caching..
iterating over the network interfaces is not cheap (lots of syscalls).

Also, the way this is currently designed, each seek will result in the files being re-opened
and the metadata re-parsed. I imagine this is bad for efficiency.

Why not integrate the short circuit construction into {{getBlockReader}} instead of separately
at the two call-sites?

+import java.io.*;
+import org.apache.hadoop.io.*;

please expand imports

{{BlockLocalPathInfo}} needs audience/stability annotations. So does {{BlockMetadataHeader}}
now that it's public.

Also, since we now have the protocol-independence layer, does this .protocol class need to
be Writable? Or can it be handled by the translation layer?

Rather than a single user for the local path access, why not use an ACL, given we already
have support for parsing/checking ACLs? I can imagine some people may have multiple applications
that do local random-read access that could benefit from this.


{{checkKerberosAuthMethod}} purports to be a general check, but the error message specifically
references {{getBlockPathInfo}}.


{{getBlockLocalPathInfo}} has some bad indentation. I'd also change those log messages to
TRACE level.


In FSDataset.getBlockLocalPathInfo, no need to qualify {{FSDataset.getMetaFile}}.


Test wise, you should cover positional reads and differently sized reads. I think this would
turn up a bug as pointed out above with reads smaller than a checksum chunk size.
> Shortcut a local client reads to a Datanodes files directly
> -----------------------------------------------------------
>                 Key: HDFS-2246
>                 URL: https://issues.apache.org/jira/browse/HDFS-2246
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sanjay Radia
>         Attachments: 0001-HDFS-347.-Local-reads.patch, HDFS-2246-branch-0.20-security-205.1.patch,
HDFS-2246-branch-0.20-security-205.2.patch, HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security-205.patch,
HDFS-2246-branch-0.20-security-205.patch, HDFS-2246-branch-0.20-security.3.patch, HDFS-2246-branch-0.20-security.no-softref.patch,
HDFS-2246-branch-0.20-security.patch, HDFS-2246-branch-0.20-security.patch, HDFS-2246-trunk.patch,
HDFS-2246-trunk.patch, HDFS-2246.20s.1.patch, HDFS-2246.20s.2.txt, HDFS-2246.20s.3.txt, HDFS-2246.20s.4.txt,
HDFS-2246.20s.patch, localReadShortcut20-security.2patch

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message