hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-10983) OIV tool should make an EC file explicit
Date Fri, 03 Feb 2017 01:09:51 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847993#comment-15847993
] 

Manoj Govindassamy edited comment on HDFS-10983 at 2/3/17 1:09 AM:
-------------------------------------------------------------------

[~andrew.wang], [~jojochuang],

Here are the proposals. Please let me know your thoughts on the below.

1. OIV HTTP server does expose a read-only WebHDFS API which can be queried to print all file
details.

1.a: Users can also get JSON formatted FileStatuses via HTTP REST API, which can very well
be extended. Here is the proposal for REST API output.  Added "hdfs.erasurecoding.policy"
for Directory and "blockType" for File.

{noformat}
curl -i http://127.0.0.1:5978/webhdfs/v1/ec?op=getfilestatus
{"FileStatus":
{"owner":"manoj","replication":0,"hdfs.erasurecoding.policy":"XOR-2-1-64k",
 "length":0,"permission":"755","type":"DIRECTORY", 
"blockSize":0,"pathSuffix":"","modificationTime":1485921930732,
"childrenNum":2,"accessTime":0,"group":"supergroup","fileId":16406}
}

curl -i http://127.0.0.1:5978/webhdfs/v1/ec/file.txt?op=getfilestatus
{"FileStatus":
{"owner":"manoj","replication":3,"blockType":"STRIPED", 
"length":0,"permission":"644","type":"FILE","blockSize":134217728, 
"pathSuffix":"","modificationTime":1485921930729,"childrenNum":0,
"accessTime":1485921930710,"group":"supergroup","fileId":16407}
}
{noformat}

1.b:  But,. when it is queried over shell, webhdfs returns only {{FileStatus}} as return type
which doesn't carry any EC related details. So, I am not sure if we can make the following
one to print extra details on EC file/dir.
{noformat}
hdfs dfs -ls webhdfs://127.0.0.1:5978/
<output same as before>
{noformat}


2. {{OIV XML processor}} already has support for EC. Please review the output below.
{noformat}
  1 <inode>
  2     <id>16406</id>
  3     <type>DIRECTORY</type>
  4     <name>ec</name>
  5     <mtime>1485918336816</mtime>
  6     <permission>manoj:supergroup:0755</permission>
  7     <xattrs>
  8         <xattr>
  9             <ns>SYSTEM</ns>
 10             <name>hdfs.erasurecoding.policy</name>     <=======
 11             <val>XOR-2-1-64k</val>
 12         </xattr>
 13     </xattrs>
 14     <nsquota>-1</nsquota>
 15     <dsquota>-1</dsquota>
 16 </inode>
 17 <inode>
 18     <id>16407</id>
 19     <type>FILE</type>
 20     <name>EmptyECFile.txt</name>
 21     <replication>3</replication>
 22     <mtime>1485918336813</mtime>
 23     <atime>1485918336796</atime>
 24     <preferredBlockSize>134217728</preferredBlockSize>
 25     <permission>manoj:supergroup:0644</permission>
 26     <storagePolicyId>0</storagePolicyId>
 27     <blockType>
 28         <name>STRIPED</name>    <=======
 29     </blockType>
 30 </inode>
{noformat}


3. {{OIV Delimited processor}} doesn't have support for EC. Here is the proposal for the new
Header ("BlockType") and value ("CONTIGUOUS"/"STRIPED").

{noformat}
Path                Replication ModificationTime    AccessTime  PreferredBlockSize  BlockType
  BlocksCount FileSize    NSQUOTA DSQUOTA Permission  UserName    GroupName
/                   0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   9223372036854775807 -1  drwxr-xr-x  manoj   supergroup
/dir0               0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup
/dir0/file0         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file1         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file2         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file3         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup

/emptydir           0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup

/ec                 0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup
/ec/EmptyECFile.txt 3   2017-01-31 18:57    2017-01-31 18:57    134217728           STRIPED
    0   0   0   0   -rw-r--r--  manoj   supergroup
/ec/SmallECFile.txt 3   2017-01-31 18:57    2017-01-31 18:57    134217728           STRIPED
    1   0   0   0   -rw-r--r--  manoj   supergroup
{noformat}











was (Author: manojg):
[~andrew.wang], [~jojochuang],

Here are the proposals. Please let me know your thoughts on the below.

1. OIV HTTP server does expose a read-only WebHDFS API which can be queried to print all file
details.

1.a: Users can also get JSON formatted FileStatuses via HTTP REST API, which can very well
be extended. Here is the proposal for REST API output.  Added "hdfs.erasurecoding.policy"
for Directory and "blockType" for File.

{noformat}
curl -i http://127.0.0.1:5978/webhdfs/v1/ec?op=getfilestatus
{"FileStatus":
{"owner":"manoj","replication":0,"hdfs.erasurecoding.policy":"XOR-2-1-64k",
 "length":0,"permission":"755","type":"DIRECTORY", 
"blockSize":0,"pathSuffix":"","modificationTime":1485921930732,
"childrenNum":2,"accessTime":0,"group":"supergroup","fileId":16406}
}

curl -i http://127.0.0.1:5978/webhdfs/v1/ec/file.txt?op=getfilestatus
{"FileStatus":
{"owner":"manoj","replication":3,"blockType":"STRIPED", 
"length":0,"permission":"644","type":"FILE","blockSize":134217728, 
"pathSuffix":"","modificationTime":1485921930729,"childrenNum":0,
"accessTime":1485921930710,"group":"supergroup","fileId":16407}
}
{noformat}

1.b:  But,. when it is queried over shell, webhdfs returns only {{FileStatus}} as return type
which doesn't carry any EC related details. So, I am not sure if we can make the following
one to print extra details on EC file/dir.
{noformat}
hdfs dfs -ls webhdfs://127.0.0.1:5978/
<output same as before>
{noformat}


2. {{OIV XML processor}} already has support for EC. Please review the output below.
{noformat}
  1 <inode>
  2     <id>16406</id>
  3     <type>DIRECTORY</type>
  4     <name>ec</name>
  5     <mtime>1485918336816</mtime>
  6     <permission>manoj:supergroup:0755</permission>
  7     <xattrs>
  8         <xattr>
  9             <ns>SYSTEM</ns>
 10             <name>hdfs.erasurecoding.policy</name>     <=======
 11             <val>XOR-2-1-64k</val>
 12         </xattr>
 13     </xattrs>
 14     <nsquota>-1</nsquota>
 15     <dsquota>-1</dsquota>
 16 </inode>
 17 <inode>
 18     <id>16407</id>
 19     <type>FILE</type>
 20     <name>EmptyECFile.txt</name>
 21     <replication>3</replication>
 22     <mtime>1485918336813</mtime>
 23     <atime>1485918336796</atime>
 24     <preferredBlockSize>134217728</preferredBlockSize>
 25     <permission>manoj:supergroup:0644</permission>
 26     <storagePolicyId>0</storagePolicyId>
 27     <blockType>
 28         <name>STRIPED</name>    <=======
 29     </blockType>
 30 </inode>
{noformat}


2. {{OIV Delimited processor}} doesn't have support for EC. Here is the proposal for the new
Header ("BlockType") and value ("CONTIGUOUS"/"STRIPED").

{noformat}
Path                Replication ModificationTime    AccessTime  PreferredBlockSize  BlockType
  BlocksCount FileSize    NSQUOTA DSQUOTA Permission  UserName    GroupName
/                   0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   9223372036854775807 -1  drwxr-xr-x  manoj   supergroup
/dir0               0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup
/dir0/file0         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file1         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file2         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup
/dir0/file3         3   2017-01-31 18:57    2017-01-31 18:57    134217728           CONTIGUOUS
 1   1   0   0   -rw-r--r--  manoj   supergroup

/emptydir           0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup

/ec                 0   2017-01-31 18:57    1969-12-31 16:00    0                   NA   
      0   0   -1  -1  drwxr-xr-x  manoj   supergroup
/ec/EmptyECFile.txt 3   2017-01-31 18:57    2017-01-31 18:57    134217728           STRIPED
    0   0   0   0   -rw-r--r--  manoj   supergroup
/ec/SmallECFile.txt 3   2017-01-31 18:57    2017-01-31 18:57    134217728           STRIPED
    1   0   0   0   -rw-r--r--  manoj   supergroup
{noformat}










> OIV tool should make an EC file explicit
> ----------------------------------------
>
>                 Key: HDFS-10983
>                 URL: https://issues.apache.org/jira/browse/HDFS-10983
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Wei-Chiu Chuang
>            Assignee: Manoj Govindassamy
>              Labels: hdfs-ec-3.0-nice-to-have
>         Attachments: HDFS-10983.01.patch
>
>
> The OIV tool's webhdfs interface does not print if a file is striped or not.
> Also, it prints the file's EC policy ID as replication factor, which is inconsistent
to the output of a typical webhdfs call to the cluster, which always shows replication factor
of 0 for EC files.
> Not just webhdfs, but delimiter output does not print if a file is stripped or not either.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message