jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel F. Savarese" <...@savarese.org>
Subject Re: [PATCH] for unicode problem over 0xff characters
Date Sun, 12 Nov 2000 18:04:27 GMT

>I made a patch for unicode problem at Perl5Compiler.java and
>Perl5Matcher.java.

Thanks very much for your efforts and providing a patch.  Unfortunately,
the fundamental approach is not the ultimate one that should be taken,
so I don't think we should apply the patch or some variation thereof.  The
problem is that it can cause excessive memory use (e.g., up to 8K per
character class) since it follows the same bitfield approach used for
the 8-bit ASCII character classes.  Handling 16-bit characters in a
character class requires a different approach, which is a little less
efficient in matching time, but much more efficient in the use of memory.
Implementing the "proper" solution, however, will require a good investment
of time and some significant code changes.

However, as a stopgap measure, we could implement the bitfield approach,
making it clear in comments and in the CHANGES file that it is temporary.
We could take an informal vote to that effect.  The problem with the patch
you posted is that it allows for incorrect matches or
ArrayOutOfBoundsExceptions to be thrown if the input contains characters
outside of the upper limit of the range of the character class range:

--- 806,812 ----
   if(nextChar == __EOS && inputRemains)
     nextChar = __input[input];

!  if((__program[current + (nextChar >> 4)] &
       (1 << (nextChar & 0xf))) != 0)
     return false;

It also doesn't make the necessary changes to Perl5Debug, which would
break after this patch was applied.  To generalize the current bitfield
implementation, you need to store the ultimate size of the bitfield and
make a comparision to ensure the input character can be used to index
into the bitfield.  At any rate, making these changes is rather
straightforward; I never did because of the desire to conserve memory.
So the question is, do people feel we should implement a temporary
stopgap measure, or just wait to "do it right"?

daniel



Mime
View raw message