tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: Create FileInputStream in servlet from remote file with accentuated character name
Date Mon, 21 Sep 2009 09:45:34 GMT
Sylvie Perrin wrote:
> Christopher,
> Here is the stack trace of the FileNotFoundException:
> /home/me/mountDir/fichi��.txt (No such file or 
> directory)


maybe what appears above shows the origin of the problem, and explains
what I was trying previously to tell you.
It is difficult to be sure, because (again) there are several layers of
encoding/decoding between your logfile, and how it may show up in this 

The problem is not your problem per se.  You are not necessarily doing 
anything wrong. The problem is basically in the lack of a common 
standard between different OS'es and filesystem types, about how to 
represent filenames containing non-US-ASCII characters.

Below, I am trying to explain the root of the problem, concisely but 
fully.  It *is* a complex matter, that's why it is confusing.  But you 
are not alone in being confused or puzzled.  Unless one has had to deal 
with such issues many times, it is really easy to get confused, because 
in this case, what one sees is not necessarily what one gets.

Assuming that what I see above is also what you see in the logfile 
("fichi" + 2 strange characters + ".txt") :

- java is trying to open a file named "fichi" + 2 strange characters +
- these two characters *may* be the Unicode/UTF-8 encoding of the
character "é" (e with acute accent)
- but java is not finding that file (obviously)

Furthermore :
The file is really located on a Windows server.
The Windows directory where the file is located, is "mounted" through 
the CIFS filesystem, onto a local mountpoint on your (Linux) Java and 
Tomcat host.
On your Java/Tomcat host, Java is seeing the contents of this directory
*through* this CIFS filesystem mount.
In principle (but that is only an assumption here), the CIFS filesystem 
code (running on the localhost) shows this (remote) directory content to 
a local application "as is", without making any character set translation.

Now Java (on your local system) is trying to find this file 
"fichiXX.txt", and not finding it. (XX being 2 the two unknown bytes)
That means that, on the remote system, this file "fichXX.txt" does not 

If you connect to that remote system via, for instance, a Remote Desktop 
or a VNC console (or even from your local station, just browse this 
"share" through the Windows Explorer), and examine the content of that 
directory, you probably see a file named "fichié.txt".

But that is only what you *see*, through whatever interface you use.
In reality, the "é" in this filename may (or may not) be encoded, in the 
Windows directory entry, as 2 bytes. Or it may be encoded with (for 
instance) a Windows 8-bit codepage, as a single byte.
If so, that is why Java, which is trying to find this "é" as 2 bytes, 
does not find it.

Now comes the difficult part :

To solve your problem thus, you have to make sure that when Java is 
looking for a filename which, from the Java point of view, contains an 
"é" character, this Java "é" *character* (whatever its representation is 
as bytes in Java), matches the byte representation of the "é" character, 
in the filesystem of the remote host where the file actually resides.

And the problem is, that these two "systems" (Java and your current 
platform) and the remote OS, do not necessarily agree on what this byte 
representation of an "é" character is.

For example, suppose you find the right set of measures that make your 
Java program find the file in the end.
Then, you replace the Windows fileserver by a Linux server, sharing its 
files through Samba.
Well, the problem may then show up again, because the encoding may be 
different again.
That is why I was recommending to stick to US-ASCII names.  It was not a 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message