hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Hadoop chokes on file names with ":" in them
Date Fri, 10 Oct 2008 17:27:45 GMT
The safest thing is to restrict your Hadoop file names to a 
common-denominator set of characters that are well supported by Unix, 
Windows, and URIs.  Colon is a special character on both Windows and in 
URIs.  Quoting is in theory possible, but it's hard to get it right 
everywhere in practice.  One can devise heuristics that determine 
whether a colon is intende to be part of a name in a relative path 
rather than indicating a URI scheme or a Windows device, but making sure 
that all components observe that heuristic (Java's URI handler, Windows 
FS, etc.) is impossible and this leads to inconsistent behavior.  HDFS 
prohibits colons in filenames for this reason.


Brian Bockelman wrote:
> Hey all,
> Hadoop tries to parse file names with ":" in them as a relative URL:
> [brian@red ~]$ hadoop fs -put /tmp/test 
> /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008
> put: Pathname /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008 
> from /user/brian/StageOutTest-24328-Fri-Oct-10-07:58:44-2008 is not a 
> valid DFS filename.
> Usage: java FsShell [-put <localsrc> ... <dst>]
> Our users do timestamps like that *a lot*.  It appears that Hadoop tries 
> to interpret the ":" as a sign that you are trying to use a relative URL.
> Is there any reason to not support the ":" character in file names?
> Brian

View raw message