Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 6573 invoked from network); 10 Oct 2008 17:29:20 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Oct 2008 17:29:20 -0000 Received: (qmail 51307 invoked by uid 500); 10 Oct 2008 17:29:14 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 51266 invoked by uid 500); 10 Oct 2008 17:29:14 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 51251 invoked by uid 99); 10 Oct 2008 17:29:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 10:29:14 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of cutting@gmail.com designates 209.85.217.13 as permitted sender) Received: from [209.85.217.13] (HELO mail-gx0-f13.google.com) (209.85.217.13) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Oct 2008 17:28:10 +0000 Received: by gxk6 with SMTP id 6so383043gxk.5 for ; Fri, 10 Oct 2008 10:27:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:references:in-reply-to :content-type:content-transfer-encoding:sender; bh=m7J6mYtnmW5KKCQ/OHl0vRmJwcGyQsiAGUAi19bEGnU=; b=B5xhVWnIN1yzSRZsJRjTVvyjNddT8gOIJKOSfX8KaDrlmwo3c18a1q898WJ2XaYRND G+D01BiLrWbJimJ4WmTCFH2EdSabCZfwSjazbKt/FyLlNUr8X650lDRHGV5NDaAfDPSE aQOQ+RjvAUbQ4UWeT1koQv8aMU6TDEWtvNTpY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding:sender; b=wQR/dWYYwTEMmtcy6biI2r1PcMGdUCQWXvyjnbdQPN6+UNVHOFkarzkobRSmjAUHyH 0uJF1eb6vpHFfTp3nrOznUF4iGhLfNTndd6QBPQpyworTE4UPEEhS6gGSPBPCgfuplKF 5AsP/H4e8DHWzjIq7ZKBO8py9uNd3WZ0MKL80= Received: by 10.142.141.21 with SMTP id o21mr911945wfd.213.1223659665068; Fri, 10 Oct 2008 10:27:45 -0700 (PDT) Received: from ?192.168.168.16? (c-76-103-191-253.hsd1.ca.comcast.net [76.103.191.253]) by mx.google.com with ESMTPS id 29sm4467045wfg.0.2008.10.10.10.27.43 (version=TLSv1/SSLv3 cipher=RC4-MD5); Fri, 10 Oct 2008 10:27:43 -0700 (PDT) Message-ID: <48EF9091.7000603@apache.org> Date: Fri, 10 Oct 2008 10:27:45 -0700 From: Doug Cutting User-Agent: Thunderbird 2.0.0.17 (X11/20080925) MIME-Version: 1.0 To: core-user@hadoop.apache.org Subject: Re: Hadoop chokes on file names with ":" in them References: <5F3F3CEE-2BA7-46A6-B686-95531D45E467@cse.unl.edu> In-Reply-To: <5F3F3CEE-2BA7-46A6-B686-95531D45E467@cse.unl.edu> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: Doug Cutting X-Virus-Checked: Checked by ClamAV on apache.org The safest thing is to restrict your Hadoop file names to a common-denominator set of characters that are well supported by Unix, Windows, and URIs. Colon is a special character on both Windows and in URIs. Quoting is in theory possible, but it's hard to get it right everywhere in practice. One can devise heuristics that determine whether a colon is intende to be part of a name in a relative path rather than indicating a URI scheme or a Windows device, but making sure that all components observe that heuristic (Java's URI handler, Windows FS, etc.) is impossible and this leads to inconsistent behavior. HDFS prohibits colons in filenames for this reason. Doug Brian Bockelman wrote: > Hey all, > > Hadoop tries to parse file names with ":" in them as a relative URL: > > [brian@red ~]$ hadoop fs -put /tmp/test > /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008 > put: Pathname /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008 > from /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008 is not a > valid DFS filename. > Usage: java FsShell [-put ... ] > > Our users do timestamps like that *a lot*. It appears that Hadoop tries > to interpret the ":" as a sign that you are trying to use a relative URL. > > Is there any reason to not support the ":" character in file names? > > Brian