Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DD5417260 for ; Fri, 2 Sep 2011 20:05:40 +0000 (UTC) Received: (qmail 84304 invoked by uid 500); 2 Sep 2011 20:05:37 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 84208 invoked by uid 500); 2 Sep 2011 20:05:37 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 84200 invoked by uid 500); 2 Sep 2011 20:05:37 -0000 Delivered-To: apmail-hadoop-core-user@hadoop.apache.org Received: (qmail 84197 invoked by uid 99); 2 Sep 2011 20:05:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 20:05:36 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mengmao@gmail.com designates 209.85.216.169 as permitted sender) Received: from [209.85.216.169] (HELO mail-qy0-f169.google.com) (209.85.216.169) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 20:05:28 +0000 Received: by qyk27 with SMTP id 27so1103448qyk.14 for ; Fri, 02 Sep 2011 13:05:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; bh=lfSX0vFis9RyocPJ/HYwvGX19vZy11q4chTLxfdbIiQ=; b=AHBlur+9XVeNHyg3CrLc/tJozVnt+eAMQI9ciLQAqbGvXY1opAGmjXxMfsNzWY6E+x k+4e0xh48wv4Rfk36y64oP8SjIgN+LGfpQ34QdPJmJyuzSRIe56AsBaRzv6QWp9aHFnQ Bki37AB0jGFfpVENP6HzrzybyR1JIpDgdsv8U= Received: by 10.229.90.71 with SMTP id h7mr1074443qcm.295.1314993908135; Fri, 02 Sep 2011 13:05:08 -0700 (PDT) MIME-Version: 1.0 Received: by 10.229.124.80 with HTTP; Fri, 2 Sep 2011 13:04:48 -0700 (PDT) From: Meng Mao Date: Fri, 2 Sep 2011 16:04:48 -0400 Message-ID: Subject: do HDFS files starting with _ (underscore) have special properties? To: core-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=00163649982dc6b7e104abfae0fe X-Virus-Checked: Checked by ClamAV on apache.org --00163649982dc6b7e104abfae0fe Content-Type: text/plain; charset=UTF-8 We have a compression utility that tries to grab all subdirs to a directory on HDFS. It makes a call like this: FileStatus[] subdirs = fs.globStatus(new Path(inputdir, "*")); and handles files vs dirs accordingly. We tried to run our utility against a dir containing a computed SOLR shard, which has files that look like this: -rw-r--r-- 2 hadoopuser visible 8538430603 2011-09-01 18:58 /test/output/solr-20110901165238/part-00000/data/index/_ox.fdt -rw-r--r-- 2 hadoopuser visible 233396596 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.fdx -rw-r--r-- 2 hadoopuser visible 130 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.fnm -rw-r--r-- 2 hadoopuser visible 2147948283 2011-09-01 18:55 /test/output/solr-20110901165238/part-00000/data/index/_ox.frq -rw-r--r-- 2 hadoopuser visible 87523726 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.nrm -rw-r--r-- 2 hadoopuser visible 920936168 2011-09-01 18:57 /test/output/solr-20110901165238/part-00000/data/index/_ox.prx -rw-r--r-- 2 hadoopuser visible 22619542 2011-09-01 18:58 /test/output/solr-20110901165238/part-00000/data/index/_ox.tii -rw-r--r-- 2 hadoopuser visible 2070214402 2011-09-01 18:51 /test/output/solr-20110901165238/part-00000/data/index/_ox.tis -rw-r--r-- 2 hadoopuser visible 20 2011-09-01 18:51 /test/output/solr-20110901165238/part-00000/data/index/segments.gen -rw-r--r-- 2 hadoopuser visible 282 2011-09-01 18:55 /test/output/solr-20110901165238/part-00000/data/index/segments_2 The globStatus call seems only able to pick up those last 2 files; the several files that start with _ don't register. I've skimmed the FileSystem and GlobExpander source to see if there's anything related to this, but didn't see it. Google didn't turn up anything about underscores. Am I misunderstanding something about the regex patterns needed to pick these up or unaware of some filename convention in HDFS? --00163649982dc6b7e104abfae0fe--