directory-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <elecha...@gmail.com>
Subject RE: ApacheDS non US-ASCII DN manipulation?
Date Wed, 31 Jan 2007 00:04:12 GMT
Pierre-Alain RIVIERE wrote :

> Hi everyone,
Hi Pierre-Alain, 


> 
> I'm working on a project where I have embedded an Apache DS for testing 
> purpose.

That's a good news :)


> In my unit test, I'm trying to launch a request like this one
> 
>     "(&(member=cn=John
>     Doe,ou=Peoples,ou=Paris,ou=Offices,dc=ippon,dc=fr)(objectClass=ipponGroup))"

Seems to be perfectly correct, at first sight. And at second sight, I confirm this is a correct
filter (and a correct DN)
 
> Unfortunaly this request fails with the following stack trace
> 
>     Caused by: java.lang.NullPointerException
>         at
>     org.apache.directory.server.core.schema.DnNormalizer.normalize(DnNormalizer.java:64)

Hmmm. NPE are never a good thing.


> With the debugger I found that the concerned method - 
> DnNormalizer#normalize(Object) - does not perform LdapDN dn 
> initilization. Here the code :
<snip/>

Yeah, I think that we should throw an exception if the dn is not a String, a Name or a LdapDN.
But not a NPE...

> In my case dn is null because value passed as parameter is a byte[]. 

Ahhh. This is not a good idea to pass a byte[]...


> Indeed, it seems that the case of DN containing non US-ASCII characters 
> - my DNs may be composed with all characters used for (french) name and 
> surname - is represented by ApacheDS with a byte[] - which can be used 
> to construct a new String representation.
Well, this is not true. In fact, if you are using an embbeded ADS, then you should pass Human
Readable data - like DNs - as UTF-8 strings,
and all other values as byte[] - like JPegPhoto, for instance -.

Passing a byte[] is not a option, because then we have no clue about which kind of encoding
an user has used to transform the String
to a byte[]. For instance, let's assume you have a String, with french chars ( like 'é').
If your local encoding is UTF-8, then
the transformation will generate a different byte array than if your local encoding is ISO-8859-1.

But as you can't tell the server which encoding you have used, there is no way it can assume
that you have used UTF-8 or something else
(even if you used "UTF-8", as expected).

So, basically, you should _always_ pass the filter as a String. If you use a Java string,
then be carefull about special chars. Don't forget
that a java file will be stored using a special encoding on your system. It's better to use
the '\uxxxx' for special chars into your string, 
this way this is guarantee that your string will be correctly transmitted.


> 
> Is ApacheDS fails to handle UTF8 DN or should I not use UTF8 DN? In the 
> second case, is the LDAP protocol explicitly proscribe non US-ASCII DN?

Apache DS handle correctly LdapString. UTF-8 byte[] encoded strings are just used to transmit
data from a client to the server
and from the server to the data. The first thing the server does is to transform those byte[]
to Strings (for attributeTypes which are 
Human readable)

So, basically, never use UTF-8 encoded DN. 

The Ldap Protocol does not handle anything but bytes. Rules for switching from Strings to
byte[] for DN are given in RFC 2253 (http://www.faqs.org/rfcs/rfc2253.html)
String transmitted through the Ldap Protocol messages are first transformed to UTF-8 encoded
Strings (which are byte[], btw) and decoded before being handled.

If you don't use the protocol layer, then there is no need to pass byte[] to the API


I hope I was clear enough to be an help for you. Anyway, just consider that this is not an
easy matter (I have spent days to undesrtand how to 
implement it into the server...)

May be the API should also be changed to avoid such usage. I must admit that throwing a NPE
is, well, not good at all ;)

Hope it helps,

Emmanuel Lécharny.





Mime
View raw message