hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: do HDFS files starting with _ (underscore) have special properties?
Date Sat, 03 Sep 2011 03:45:49 GMT
Meng,

What version of hadoop are you on? I'm able to use globStatus(Path)
for '_' listing successfully, with a '*' glob. Although the same
doesn't apply to what FsShell's ls utility provide (which is odd
here!).

Here's my test code which can validate that the listing is indeed
done: http://pastebin.com/vCbd2wmK

$ hadoop dfs -ls
Found 4 items
drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 09:09
/user/harshchouraria/_abc
-rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
/user/harshchouraria/_def
drwxr-xr-x   - harshchouraria supergroup          0 2011-09-03 08:10
/user/harshchouraria/abc
-rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
/user/harshchouraria/def


$ hadoop dfs -ls '*'
-rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
/user/harshchouraria/_def
-rw-r--r--   1 harshchouraria supergroup          0 2011-09-03 09:10
/user/harshchouraria/def

$ # No dir results! ^^

$ hadoop jar myjar.jar # (My code)
hdfs://localhost/user/harshchouraria/_abc
hdfs://localhost/user/harshchouraria/_def
hdfs://localhost/user/harshchouraria/abc
hdfs://localhost/user/harshchouraria/def

I suppose that means globStatus is fine, but the FsShell.ls(…) code
does something more than a simple glob status, and filters away
directory results when used with a glob.

On Sat, Sep 3, 2011 at 3:07 AM, Meng Mao <mengmao@gmail.com> wrote:
> Is there a programmatic way to access these hidden files then?
>
> On Fri, Sep 2, 2011 at 5:20 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>> On Fri, Sep 2, 2011 at 4:04 PM, Meng Mao <mengmao@gmail.com> wrote:
>>
>> > We have a compression utility that tries to grab all subdirs to a
>> directory
>> > on HDFS. It makes a call like this:
>> > FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*"));
>> >
>> > and handles files vs dirs accordingly.
>> >
>> > We tried to run our utility against a dir containing a computed SOLR
>> shard,
>> > which has files that look like this:
>> > -rw-r--r--   2 hadoopuser visible 8538430603 2011-09-01 18:58
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt
>> > -rw-r--r--   2 hadoopuser visible  233396596 2011-09-01 18:57
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx
>> > -rw-r--r--   2 hadoopuser visible        130 2011-09-01 18:57
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm
>> > -rw-r--r--   2 hadoopuser visible 2147948283 2011-09-01 18:55
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.frq
>> > -rw-r--r--   2 hadoopuser visible   87523726 2011-09-01 18:57
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm
>> > -rw-r--r--   2 hadoopuser visible  920936168 2011-09-01 18:57
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.prx
>> > -rw-r--r--   2 hadoopuser visible   22619542 2011-09-01 18:58
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tii
>> > -rw-r--r--   2 hadoopuser visible 2070214402 2011-09-01 18:51
>> > /test/output/solr-20110901165238/part-00000/data/index/_ox.tis
>> > -rw-r--r--   2 hadoopuser visible         20 2011-09-01 18:51
>> > /test/output/solr-20110901165238/part-00000/data/index/segments.gen
>> > -rw-r--r--   2 hadoopuser visible        282 2011-09-01 18:55
>> > /test/output/solr-20110901165238/part-00000/data/index/segments_2
>> >
>> >
>> > The globStatus call seems only able to pick up those last 2 files; the
>> > several files that start with _ don't register.
>> >
>> > I've skimmed the FileSystem and GlobExpander source to see if there's
>> > anything related to this, but didn't see it. Google didn't turn up
>> anything
>> > about underscores. Am I misunderstanding something about the regex
>> patterns
>> > needed to pick these up or unaware of some filename convention in HDFS?
>> >
>>
>> Files starting with '_' are considered 'hidden' like unix files starting
>> with '.'. I did not know that for a very long time because not everyone
>> follows this rule or even knows about it.
>>
>



-- 
Harsh J

Mime
View raw message