directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norval Hope" <nrh...@gmail.com>
Subject Character encoding problem in FilterParserImpl (AD version < 1.5)
Date Wed, 06 Aug 2008 08:33:45 GMT
Hi All,

I'm using a customized version of ApacheDS 1.5 and have run into a
problem with the filter parser. I know this has been much improved in
more recent versions of AD but I'm not able to ugrade at this moment
(however, it seems I'll be able to start embarking on the process of
resyncing with a more recent build in the next two or three weeks). I
got excited when I saw a naked getBytes() call in the code from
FilterParserImpl below:


    public synchronized ExprNode parse( String filter ) throws
ParseException, IOException
    {
        ExprNode root = null;

        if ( filter == null || filter.trim().equals( "" ) )
        {
            return null;
        }

        if ( filter.indexOf( "**" ) > -1 )
        {
            filter = StringTools.trimConsecutiveToOne( filter, '*' );
        }

        this.parserPipe.write( filter.getBytes("UTF-8") );   //
*******************
        this.parserPipe.write( '\n' );
        this.parserPipe.flush();

and added the "UTF-8" thinking that would sort out my problem. This
improved (or at least changed) the situation as the multi-byte chinese
character I passed in to the filter expression no longer came out as a
'?' but rather as different, but incorrect, character.

Given I don't know anything about the ANTLR generated code sitting on
the other end of the pipe I was hoping someone more knowledgeable
might be able to cast their minds back and offer some clues about:
  a) whether my suspicion that the ANTLR code is expecting UTF-8 is accurate
  b) whether there is anyway I might be able to tweak the ANTLR code
or Maven build process so that multi-byte characters appear correctly
in the parse tree.

Many thanks

Mime
View raw message