hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12681) Fold HdfsLocatedFileStatus into HdfsFileStatus
Date Wed, 15 Nov 2017 19:01:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253955#comment-16253955
] 

Chris Douglas commented on HDFS-12681:
--------------------------------------

bq. now it can't distiniguish whether it needs an RPC call, so we need to directly call fs.getFileBlockLocations?
v06 of the patch (not v05, sorry mixed them up) would not make an RPC if the {{FileStatus}}
included locations:
{noformat}
diff --git hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
index a8a5cfa..617cbf4 100644
--- hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
+++ hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
@@ -237,6 +236,12 @@ String getPathName(Path file) {
     if (file == null) {
       return null;
     }
+    if (file instanceof LocatedFileStatus) {
+      BlockLocation[] loc = ((LocatedFileStatus)file).getBlockLocations();
+      if (loc != null) {
+        return loc;
+      }
+    }
     return getFileBlockLocations(file.getPath(), start, len);
   }
 {noformat}

This changes the semantics for HDFS (i.e., it won't refresh locations) and the change to MapReduce:
{noformat}
diff --git hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
index 3e0ea25..0f0a45b 100644
--- hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
+++ hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/FileInputFormat.java
@@ -344,11 +344,7 @@ protected FileSplit makeSplit(Path file, long start, long length,
       if (length != 0) {
         FileSystem fs = path.getFileSystem(job);
         BlockLocation[] blkLocations;
-        if (file instanceof LocatedFileStatus) {
-          blkLocations = ((LocatedFileStatus) file).getBlockLocations();
-        } else {
-          blkLocations = fs.getFileBlockLocations(file, 0, length);
-        }
+        blkLocations = fs.getFileBlockLocations(file, 0, length);
{noformat}

Would have added additional RPC traffic for non-HDFS {{FileSystem}} implementations that rely
on the type to determine if they need locations.

{{makeQualified\[Located\]}} are internal methods that allow HDFS to lazily bind {{FileStatus}}
fields (improving space efficiency and avoiding some conversions). Clients shouldn't need
to call them.

We _hope_ that clients would request locations in the first RPC call, rather than asking for
a {{FileStatus}} and then requesting its block locations.

> Fold HdfsLocatedFileStatus into HdfsFileStatus
> ----------------------------------------------
>
>                 Key: HDFS-12681
>                 URL: https://issues.apache.org/jira/browse/HDFS-12681
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>            Priority: Minor
>             Fix For: 3.1.0
>
>         Attachments: HDFS-12681.00.patch, HDFS-12681.01.patch, HDFS-12681.02.patch, HDFS-12681.03.patch,
HDFS-12681.04.patch, HDFS-12681.05.patch, HDFS-12681.06.patch, HDFS-12681.07.patch, HDFS-12681.08.patch,
HDFS-12681.09.patch, HDFS-12681.10.patch
>
>
> {{HdfsLocatedFileStatus}} is a subtype of {{HdfsFileStatus}}, but not of {{LocatedFileStatus}}.
Conversion requires copying common fields and shedding unknown data. It would be cleaner and
sufficient for {{HdfsFileStatus}} to extend {{LocatedFileStatus}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message