hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Meng Mao <meng...@gmail.com>
Subject Re: do HDFS files starting with _ (underscore) have special properties?
Date Sat, 03 Sep 2011 18:34:43 GMT
I get the opposite behavior --

[this is more or less how I listed the files in the original email]
hadoop dfs -ls /test/output/solr-20110901165238/part-00000/data/index/*
-rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
-rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
-rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
-rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/_ox.frq
-rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
-rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
/test/output/solr-20110901165238/part-00000/data/index/_ox.prx
-rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
/test/output/solr-20110901165238/part-00000/data/index/_ox.tii
-rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/_ox.tis
-rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
/test/output/solr-20110901165238/part-00000/data/index/segments.gen
-rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
/test/output/solr-20110901165238/part-00000/data/index/segments_2

Whereas my globStatus doesn't capture them.

I thought we were on Cloudera's CDH3, but now I'm not sure. This is what
version reports:
$ hadoop version
Hadoop 0.20.1+169.56
Subversion  -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
Compiled by root on Tue Feb  9 13:40:08 EST 2010





On Fri, Sep 2, 2011 at 11:45 PM, Harsh J <harsh@cloudera.com> wrote:

> Meng,
>
> What version of hadoop are you on? I'm able to use globStatus(Path)
> for '_' listing successfully, with a '*' glob. Although the same
> doesn't apply to what FsShell's ls utility provide (which is odd
> here!).
>
> Here's my test code which can validate that the listing is indeed
> done: http://pastebin.com/vCbd2wmK
>
> $ hadoop dfs -ls
> Found 4 items
> drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 09:09
> /user/harshchouraria/_abc
> -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
> /user/harshchouraria/_def
> drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 08:10
> /user/harshchouraria/abc
> -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
> /user/harshchouraria/def
>
>
> $ hadoop dfs -ls '*'
> -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
> /user/harshchouraria/_def
> -rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
> /user/harshchouraria/def
>
> $ # No dir results! ^^
>
> $ hadoop jar myjar.jar # (My code)
> hdfs://localhost/user/harshchouraria/_abc
> hdfs://localhost/user/harshchouraria/_def
> hdfs://localhost/user/harshchouraria/abc
> hdfs://localhost/user/harshchouraria/def
>
> I suppose that means globStatus is fine, but the FsShell.ls(…) code
> does something more than a simple glob status, and filters away
> directory results when used with a glob.
>
> On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao <mengmao@gmail.com> wrote:
> > Is there a programmatic way to access these hidden files then?
> >
> > On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <edlinuxguru@gmail.com
> >wrote:
> >
> >> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao <mengmao@gmail.com> wrote:
> >>
> >> > We have a compression utility that tries to grab all subdirs to a
> >> directory
> >> > on HDFS. It makes a call like this:
> >> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));
> >> >
> >> > and handles files vs dirs accordingly.
> >> >
> >> > We tried to run our utility against a dir containing a computed SOLR
> >> shard,
> >> > which has files that look like this:
> >> > -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
> >> > -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
> >> > -rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
> >> > -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
> >> > -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
> >> > -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
> >> > -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
> >> > -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
> >> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
> >> > -rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
> >> > /test/output/solr-20110901165238/part-00000/data/index/segments.gen
> >> > -rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
> >> > /test/output/solr-20110901165238/part-00000/data/index/segments_2
> >> >
> >> >
> >> > The globStatus call seems only able to pick up those last 2 files; the
> >> > several files that start with _ don't register.
> >> >
> >> > I've skimmed the FileSystem and GlobExpander source to see if there's
> >> > anything related to this, but didn't see it. Google didn't turn up
> >> anything
> >> > about underscores. Am I misunderstanding something about the regex
> >> patterns
> >> > needed to pick these up or unaware of some filename convention in
> HDFS?
> >> >
> >>
> >> Files starting with '_' are considered 'hidden' like unix files starting
> >> with '.'. I did not know that for a very long time because not everyone
> >> follows this rule or even knows about it.
> >>
> >
>
>
>
> --
> Harsh J
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message