directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norval Hope" <nrh...@gmail.com>
Subject Re: Character encoding problem in FilterParserImpl (AD version < 1.5)
Date Fri, 05 Sep 2008 07:55:45 GMT
Hi Again,

Just wanted to report that I successfully back-ported the new
FilterParser to my mongrel hybrid patched version of AD < 1.5-ish. In
the process I ran into a number of UTF8 encoding issues when dealing
with characters with >2 char encodings and patched a number of files
(part / most of the issue was unrequired conversion to \HH escape
notation no doubt due to the original LDAP DN / search filter RPFCs
being a bit vague about when it was required versus supported) and
added quite a few unit tests capturing what I see as the preferred
behaviour (see below).

I'm currently in the process of looking a newer version of AD and
considering the somewhat daunting tasks of assessing how to port all
my custom mods over, and / or deciding which ones are no longer
required (for instance I originally had lots of probs requiring
explicit use of LdapDN.getUpName() which I understand / hope I may now
be able to replace with humble .toString()s). While I'm doing this
port over I'll try to disentangle changes which might be interesting
to the general community versus ones that probably won't, and factor
out separate patches etc.

Anyway enough of my blathering.

I gather from the dev list that 1.5.4 may pop out into the world
shortly, which is very cool and makes me think I should base my work
off the trunk and then merge up to the final 1.5.4 tag when its
finalised so that I can quote a solid (recent) tag next time I'm
annoying people. I checked out the trunk but had two lots of unit test
failures (first two failures in
apacheds\jdbm-store\src\test\java\org\apache\directory\server\core\partition\impl\btree\jdbm\JdbmIndexTest.java
and then a bulk failure in apacheds\core-integ\ with first error
saying "[16:39:20] ERROR
[org.apache.directory.server.core.integ.IntegrationUtils] - Failed to
delete the working directory.
java.io.IOException: Unable to delete file:
server-work\system\apacheOneLevel.lg"). This is on Windows XP/JDK
1.5.0_11. Just wanted check that i'm doing the right thing using the
trunk currently, and if so to see if anyone else sees the same probs
or has some suggestions for me regarding what I should do to clean up
my env etc.

Thanks!
Norval

On Thu, Aug 7, 2008 at 9:54 PM, Norval Hope <nrhope@gmail.com> wrote:
> Hi Emanuel,
>
> Thanks for the feedback; I knew I was trying my luck a bit but thought
> I'd better check if anyone happened to remember any relevant history.
> I'm getting reasonably hopeful that I can backport the new
> FilterParser with a bit of creative reactive refactoring ...
>
> Once I get this current firefight I think I'll finally get onto the
> job of some serious resyncing with a more current version of AD. I
> know there's lots for me to get my head around, not just coming to
> terms with the new stuff but also working out how to apply my
> customisations / work out if they're still neccessary etc ...
>
> Cheers
>
> On Wed, Aug 6, 2008 at 6:48 PM, Emmanuel L├ęcharny <elecharny@gmail.com> wrote:
>> Hi Norval,
>>
>> the thing is that the antlr based filter parser has been totally rewritten
>> in 1.5, and replaced by a hand crafted parser, so it's difficult to say if
>> the previous version correctly handle UTF-8 chars in any case, but a blind
>> guess is that it may be buggy.
>>
>> Now, to be frank, I don't think we will spend some time fixing 1.0 version
>> of the server, as we are dedicated to get 2.0 out as soon as possible. 1.0
>> is almost a dead branch... That does not mean committers can't fix it, if
>> needed ! We can even release it, but I would say : it's up to you !
>>
>> Don't get me wrong : I'm not saying that you are on your own, and we don't
>> want to help you, it's just that, eh, we don't have time for 1.0 anymore, as
>> it's already really hard to find time to fix urgent bugs in 1.5 ! Hopefully,
>> as soon as we are done with some big refactoring we are currently doing for
>> months in a branch, we will be able to get back to work on trunk and fix
>> those urgent bugs...
>>
>> Thanks !
>>
>> Norval Hope wrote:
>>>
>>> Hi All,
>>>
>>> I'm using a customized version of ApacheDS 1.5 and have run into a
>>> problem with the filter parser. I know this has been much improved in
>>> more recent versions of AD but I'm not able to ugrade at this moment
>>> (however, it seems I'll be able to start embarking on the process of
>>> resyncing with a more recent build in the next two or three weeks). I
>>> got excited when I saw a naked getBytes() call in the code from
>>> FilterParserImpl below:
>>>
>>>
>>>    public synchronized ExprNode parse( String filter ) throws
>>> ParseException, IOException
>>>    {
>>>        ExprNode root = null;
>>>
>>>        if ( filter == null || filter.trim().equals( "" ) )
>>>        {
>>>            return null;
>>>        }
>>>
>>>        if ( filter.indexOf( "**" ) > -1 )
>>>        {
>>>            filter = StringTools.trimConsecutiveToOne( filter, '*' );
>>>        }
>>>
>>>        this.parserPipe.write( filter.getBytes("UTF-8") );   //
>>> *******************
>>>        this.parserPipe.write( '\n' );
>>>        this.parserPipe.flush();
>>>
>>> and added the "UTF-8" thinking that would sort out my problem. This
>>> improved (or at least changed) the situation as the multi-byte chinese
>>> character I passed in to the filter expression no longer came out as a
>>> '?' but rather as different, but incorrect, character.
>>>
>>> Given I don't know anything about the ANTLR generated code sitting on
>>> the other end of the pipe I was hoping someone more knowledgeable
>>> might be able to cast their minds back and offer some clues about:
>>>  a) whether my suspicion that the ANTLR code is expecting UTF-8 is
>>> accurate
>>>  b) whether there is anyway I might be able to tweak the ANTLR code
>>> or Maven build process so that multi-byte characters appear correctly
>>> in the parse tree.
>>>
>>> Many thanks
>>>
>>>
>>
>>
>> --
>> --
>> cordialement, regards,
>> Emmanuel L├ęcharny
>> www.nextury.com
>> directory.apache.org
>>
>>
>>
>

Mime
View raw message