hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3173) inconsistent globbing support for dfs commands
Date Thu, 29 May 2008 03:16:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600672#action_12600672
] 

Chris Douglas commented on HADOOP-3173:
---------------------------------------

I've been asked to clarify the implications of this proposal. There are 6 Path constructors:
# Path(String, String)
# Path(Path, String)
# Path(String, Path)
# Path(Path, Path)
# Path(String)
# Path(String, String, String)

Constructors 5 and 6 would preserve the path component of the URI as a String (the "rawPath")
used only for globbing; all other Path operations would continue to work as they always have.
For the following, let {{p}}, {{q}} be Paths, where {{q}} is initialized as:
{noformat}
Path q = new Path(p.toString());
{noformat}

Given the following initializations for {{p}}:
{noformat}
1. p = new Path("/foo/\\*/bar");
2. p = new Path("hdfs://foobar:8020/foo/p-1{\?}");
{noformat}

{{globStatus\(x)}} would return different results. In the first instance, globbing {{p}} would
return the directory "\*", as _expected_ in this JIRA, while globbing {{q}} would have the
result as _observed_ in this JIRA. In the second, p would be a legal glob (the escape prior
to '?' wouldn't be converted to a path separator), so given:
{noformat}
user@host$ bin/hadoop dfs -ls 'foo/bar/'
Found 5 items:
1    0           2008-05-28 20:00  -rw-r--r--  chrisdo  supergroup  /user/chrisdo/foo/bar/p-00
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  /user/chrisdo/foo/bar/p-01
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  /user/chrisdo/foo/bar/p-10
1    0           2008-05-28 20:01  -rw-r--r--  chrisdo  supergroup  /user/chrisdo/foo/bar/p-11
1    0           2008-05-28 20:03  -rw-r--r--  chrisdo  supergroup  /user/chrisdo/foo/bar/p-1?
{noformat}

One could specify both '{{foo/bar/p-1{\?}}}' (file 5) and '{{foo/bar/p-1?}}' (files 3-5).

There are two primary "globbers" in the codebase, FsShell and FileInputFormats. In the current
proposal, the latter would continue to be in the "{{q}} case", i.e. there would be no change
to its behavior. FsShell, however, would be in the "{{p}} case", i.e. the user string would
be used for globbing without first passing through Path and URI normalization. This has the
advantage of resolving this JIRA, but the significant disadvantage of making globbing in FsShell
and map/reduce inconsistent. If a user were to test out a pattern in the shell and try to
use it as a pattern for their FileInputFormat derivative, they could get different results.

> inconsistent globbing support for dfs commands
> ----------------------------------------------
>
>                 Key: HADOOP-3173
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3173
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>         Environment: Hadoop 0.16.1
>            Reporter: Rajiv Chittajallu
>             Fix For: 0.18.0
>
>         Attachments: 3173-0.patch
>
>
> hadoop dfs -mkdir /user/*/bar creates a directory "/user/*/bar" and you cant deleted
/user/* as -rmr expands the glob
> $ hadoop dfs -mkdir /user/rajive/a/*/foo
> $ hadoop dfs -ls /user/rajive/a
> Found 4 items
> /user/rajive/a/*	<dir>		2008-04-04 16:09	rwx------	rajive	users
> /user/rajive/a/b	<dir>		2008-04-04 16:08	rwx------	rajive	users
> /user/rajive/a/c	<dir>		2008-04-04 16:08	rwx------	rajive	users
> /user/rajive/a/d	<dir>		2008-04-04 16:08	rwx------	rajive	users
> $ hadoop dfs -ls /user/rajive/a/*
> /user/rajive/a/*/foo	<dir>		2008-04-04 16:09	rwx------	rajive	users
> $ hadoop dfs -rmr /user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> I am not able to escape '*' from being expanded.
> $ hadoop dfs -rmr '/user/rajive/a/*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr  '/user/rajive/a/\*'
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d
> $ hadoop dfs -rmr  /user/rajive/a/\* 
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/*
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/b
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/c
> Moved to trash: hdfs://namenode-1:8020/user/rajive/a/d

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message