jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas " <nto...@gmail.com>
Subject Re: General Packaging mechanism
Date Wed, 21 Feb 2007 14:27:37 GMT
Hi,

This link might help I think:
http://www.peterbuettner.de/develop/javasnippets/zipOnlyAscii/index.html

The important excerpt: "After investigation of the native code i found, that
they interpret the names in the zip as utf-8 encoded. The bytes are
transformed into a String nevertheless if they are valid utf-8 or not."

So if the package application produces UTF-8 encoded filename there won't be
any issue

Hope it helps.

Nicolas

On 2/21/07, Julian Reschke <julian.reschke@gmx.de> wrote:
>
> Tobias Bocanegra schrieb:
> >> Unrelated to that...:
> >>
> >> > - use a standard format for the archive (i.e. zip/jar)
> >>
> >> If you use ZIP/JAR as format, how are you going to handle non-ASCII
> >> characters in filenames in a portable way?
> >
> > all non-valid filesystem characters are escaped using url-escaping %xx
> > or %uXXXX. actually i haven't looked at how non-ascii characters are
> > handled in a jar file, but obviously it works, since i can include
> > such a file in a jar file:
> > ...
>
> My understanding is that the JAR/ZIP format is silent on filename
> encoding. So a producer if these files will have to select an encoding,
> and the recipient need to select the same one. In general, this is not
> going to work unless everybody agrees to use UTF-8.
>
> > [tripod@sulu test]$ touch "到日本来.txt"
> > [tripod@sulu test]$ ll
> > total 0
> > -rw-rw-r-- 1 tripod tripod 0 Feb 21 14:44 到日本来.txt
> > [tripod@sulu test]$ cd ..
> > [tripod@sulu jcr-car]$ jar cvf test.jar test/
> > added manifest
> > adding: test/(in = 0) (out= 0)(stored 0%)
> > adding: test/到日本来.txt(in = 0) (out= 0)(stored 0%)
> > [tripod@sulu jcr-car]$ jar tf test.jar
> > META-INF/
> > META-INF/MANIFEST.MF
> > test/
> > test/到日本来.txt
>
> I think it's using the platform encoding, and you happened to try a
> character (can't tell from your mail) that can be represented in that
> encoding. Try a mix of special character (Euro sign, Hebrew, Arabic) in
> one filename, and retry :-)
>
> Best regards, Julian
>
> (P.S.: we had trouble using ZIP as a content container format two years
> ago for the reasons above; maybe the situation has improved but I really
> doubt that)
>
>


-- 
a+
Nico
my blog! http://www.deviant-abstraction.net !!

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message