tomcat-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: Create FileInputStream in servlet from remote file with accentuated character name
Date Thu, 17 Sep 2009 23:41:20 GMT
Christopher Schultz wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Sylvie,
> 
> On 9/17/2009 9:12 AM, Sylvie Perrin wrote:
>> I have a shared directory on a windows system named SHAREDDIR and
>> containing one file named "fichié.txt"

Sylvie,
why do you not name your file "fichier.txt", like it should be written 
in French ?  That would solve your problem immediately, save a lot of 
ink on this thread, and save you a lot of time in the end.

Seriously.

There are so many pieces that play their part between on the one side a 
browser that you do not control, on a workstation that you do not 
control, in the middle HTML and HTTP for which the default character set 
is iso-8859-1 and Java for which the internal character set is Unicode, 
a local Linux filesystem which is charset-agnostic, and on the other 
side a Windows system which stores its filenames in directories as 
Unicode, that you will never get a solution that is totally foolproof.
If you have to play with a web application which involves files on 
different platforms, stick with filenames that are purely made of 
US-ASCII characters.

André




Seriously now, let's start at the beginning.
You are, like many of us, the victim of these horrible English-speaking 
imperialists in the computer industry. They just don't understand 
alphabets with more than 27 letters, and get totally confused by our és 
and às and cédilles and sharfe s'eses. But since they got there first 
(mainly because of all the anti-competitive subsidies they gave to 
Boeing and GM), we are the ones who have to adapt.

So, you have a file, which on your Unix/Linux system looks like
/home/me/mountDir/fichié.txt.
Or, does it really ?

Try the following :
- open a console window on your Linux system
- enter the command "locale -a", and find 2 result lines like :
fr_FR.iso8859-1
fr_FR.utf8
(or something similar, the point being to have one looking like it 
contains 8859-1 and the other looking like it contains "utf8").

- now enter "export LC_CTYPE=fr_FR.iso8859-1"
(adapt this in function of what you found above with locale -a)

- now enter "ls -l /home/me/mountDir/"
How does the filename look like ?

- now enter "export LC_CTYPE=fr_FR.utf8"
(adapt this in function of what you found above with locale -a)

- now enter "ls -l /home/me/mountDir/" again
How does the filename look like now ?

I would bet the file name looks different.

Now go to your Windows systems, open the Windows Explorer, and look at 
what this filename loks like.
Then on your Windows system, open a command window, navigate to the same 
directory, do a "dir", and look at what the filename loks like.
A difference, also ?

Why is that ?
The filename itself did not change in the directory of your Windows system.

But the name of that file is going to "look" different, depending on how 
many "layers" of software there are between that directory entry and the 
process that uses that filename, and on the settings of each of these 
layers.

The above are simple cases, involving just a few layers : the original 
directory, the CIFS filesystem drivers on your Linux machine, the "ls" 
program itself, and the display interface between that program and your 
console.
Now you add Java and Tomcat on top of that, and you add HTTP, and you 
add URI encoding/decoding, and you add the browser, and you add the 
encoding of your html pages.

In other words, give it up.


>> I mount this shared directory on my Linux system with the following
>> command:
>>> mount -t cifs -o iocharset=utf8 //IpWindows/SHAREDDIR /home/me/mountDir/
>> In a standalone Java application running on my Linux system, I can
>> create a FileInputStream from the file located in the remote directory
>> like this:
>>
>> String mountPath = "/home/me/mountDir";
>> File[] list = new File(mountPath).listFiles();
>> File file = list[0];
>> try {
>>    FileInputStream fStream = new FileInputStream(file);
>> }
>> catch (FileNotFoundException e) {
>>    e.printStackTrace();
>> }
> 
> Can you have your standalone Java program print the following information:
> 
> 1. The full path of the file
> 2. The values for these system properties:
>    a. file.encoding
>    b. sun.jnu.encoding
> 
>> When I execute the same code in a servlet running on the same machine,
>> the call to FileInputStream constructor always throws a
>> FileNotFountException because it  doesn't recognize the "é" character in
>> the path of the file.
> 
> Please post the above values within your servlet environment, too.
> 
> Are you sure that it's because of the é, or is it because the user that
> Tomcat is running under does not have permission to read that file?
> Under what user /is/ Tomcat running?
> 
>> Since I don't know what the problem is I have had a hard time tracking
>> down a solution online. I especialy take care to follow all steps
>> described in the FAQ/CharacterEncoding parts of wiki. Here is my
>> configuration:
>>
>> I set URIEncoding in my port 8080 connector to UTF-8 (I use this port to
>> execute my servlet)
>> <Connector port="8080" protocol="HTTP/1.1"
>>   connectionTimeout="20000"
>>   redirectPort="8443"
>>   URIEncoding="UTF-8"
>>   useBodyEncodingForURI="true" />
> 
> None of these settings matter. These are only relevant for HTTP
> communication, and your code is not reading anything from the request.
> 
>> I use a filter to set the default encoding to UTF-8 and my first line of
>> my doFilter method is
>> request.setCharacterEncoding("UTF-8");
> 
> Your filter sets /what/ default encoding? What does it set it to?
> 
> Setting the encoding of the request will not affect your code above.
> 
>> I add in my servlet the set of content-type for responses to UTF-8 and
>> my first line of my doGet method is
>> response.setContentType("text/html;charset=UTF-8");
> 
> This will also have no effect.
> 
>> My tomcat is started with CATALINA_OPTS=-Dfile.encoding=UTF-8
> 
> Okay. Let's see what your command-line program reports for
> file.encoding, etc.
> 
> - -chris
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (MingW32)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkqyZxQACgkQ9CaO5/Lv0PArBACdGM53y+0/2L1lkf3gvngXpnAz
> 8D8An3pjgMT4jBOk6jg+zRNEXGORzJ1G
> =v9Bf
> -----END PGP SIGNATURE-----
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
> For additional commands, e-mail: users-help@tomcat.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@tomcat.apache.org
For additional commands, e-mail: users-help@tomcat.apache.org


Mime
View raw message