directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <>
Subject Re: Character encoding problem in FilterParserImpl (AD version < 1.5)
Date Wed, 06 Aug 2008 08:48:46 GMT
Hi Norval,

the thing is that the antlr based filter parser has been totally 
rewritten in 1.5, and replaced by a hand crafted parser, so it's 
difficult to say if the previous version correctly handle UTF-8 chars in 
any case, but a blind guess is that it may be buggy.

Now, to be frank, I don't think we will spend some time fixing 1.0 
version of the server, as we are dedicated to get 2.0 out as soon as 
possible. 1.0 is almost a dead branch... That does not mean committers 
can't fix it, if needed ! We can even release it, but I would say : it's 
up to you !

Don't get me wrong : I'm not saying that you are on your own, and we 
don't want to help you, it's just that, eh, we don't have time for 1.0 
anymore, as it's already really hard to find time to fix urgent bugs in 
1.5 ! Hopefully, as soon as we are done with some big refactoring we are 
currently doing for months in a branch, we will be able to get back to 
work on trunk and fix those urgent bugs...

Thanks !

Norval Hope wrote:
> Hi All,
> I'm using a customized version of ApacheDS 1.5 and have run into a
> problem with the filter parser. I know this has been much improved in
> more recent versions of AD but I'm not able to ugrade at this moment
> (however, it seems I'll be able to start embarking on the process of
> resyncing with a more recent build in the next two or three weeks). I
> got excited when I saw a naked getBytes() call in the code from
> FilterParserImpl below:
>     public synchronized ExprNode parse( String filter ) throws
> ParseException, IOException
>     {
>         ExprNode root = null;
>         if ( filter == null || filter.trim().equals( "" ) )
>         {
>             return null;
>         }
>         if ( filter.indexOf( "**" ) > -1 )
>         {
>             filter = StringTools.trimConsecutiveToOne( filter, '*' );
>         }
>         this.parserPipe.write( filter.getBytes("UTF-8") );   //
> *******************
>         this.parserPipe.write( '\n' );
>         this.parserPipe.flush();
> and added the "UTF-8" thinking that would sort out my problem. This
> improved (or at least changed) the situation as the multi-byte chinese
> character I passed in to the filter expression no longer came out as a
> '?' but rather as different, but incorrect, character.
> Given I don't know anything about the ANTLR generated code sitting on
> the other end of the pipe I was hoping someone more knowledgeable
> might be able to cast their minds back and offer some clues about:
>   a) whether my suspicion that the ANTLR code is expecting UTF-8 is accurate
>   b) whether there is anyway I might be able to tweak the ANTLR code
> or Maven build process so that multi-byte characters appear correctly
> in the parse tree.
> Many thanks

cordialement, regards,
Emmanuel L├ęcharny

View raw message