jakarta-regexp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Andrievsky <dim...@teztour.com>
Subject Looks like bug in RegExp RE.subst method - version jakarta-regexp-1.3
Date Fri, 27 Aug 2004 11:55:05 GMT
Hello regexp-dev,

Purpose: bug report
Version: jakarta-regexp-1.3


Sample test:
        RE r = new RE( "\\*\\*\\*(.+?)\\*\\*" );
        String fText = r.subst("aaa ***TEXT** ***AAA** bbb", "<h3>$1</h3>", RE.REPLACE_ALL
| RE.REPLACE_BACKREFERENCES);
        System.out.println( fText );
Output:
aaa 3>TEXT</h3> 3>AAA</h3> bbb

While I expect to replace all my '***some_text**' with
'<h3>some_text</h3>', I get '3>some_text</h3>' as a replacement

So, I've run into source and found there the following code:
(RE.java, start from 1732 string)

--
[...]
                // Process backreferences
                int lCurrentPosition = 0;
                int lLastPosition = 0;
                int lLength = substitution.length();


                while ((lCurrentPosition = substitution.indexOf("$", lCurrentPosition)) >=
0)
                {
                    if ((lCurrentPosition == 0 || substitution.charAt(lCurrentPosition - 1)
!= '\\')
                        && lCurrentPosition+1 < lLength)
                    {
                        char c = substitution.charAt(lCurrentPosition + 1);
                        if (c >= '0' && c <= '9')
                        {
                            // Append everything between the last and the current $ sign
                            ret.append(substitution.substring(lLastPosition + 2, lCurrentPosition));

                            // Append the parenthesized expression
                            // Note: if a parenthesized expression of the requested
                            // index is not available "null" is added to the string
                            ret.append(getParen(c - '0'));
                            lLastPosition = lCurrentPosition;
                        }
                    }

                    // Move forward, skipping past match
                    lCurrentPosition++;
                }

                // Append everything after the last $ sign
                ret.append(substitution.substring(lLastPosition + 2,lLength));
[...]
--

Especially

          ret.append(substitution.substring(lLastPosition + 2, lCurrentPosition));
          
It's good for if we have more than one $-variables, good for all $-
variables exept the first one.
Initially lLastPosition has value of 0, so the first two symbols are
always lost.

May be it is not bad idea to verify was there any previous variable or
not, as follows:
--
[...]
                // Process backreferences
                int lCurrentPosition = 0;
                int lLastPosition = 0;
                int lLength = substitution.length();

                // ! verify was variable or not
                // initially - it is not
                boolean wasSign = false;
                                

                while ((lCurrentPosition = substitution.indexOf("$", lCurrentPosition)) >=
0)
                {
                    if ((lCurrentPosition == 0 || substitution.charAt(lCurrentPosition - 1)
!= '\\')
                        && lCurrentPosition+1 < lLength)
                    {
                        char c = substitution.charAt(lCurrentPosition + 1);
                        if (c >= '0' && c <= '9')
                        {
                            // Append everything between the last and the current $ sign
                            ret.append(substitution.substring(wasSign ? lLastPosition + 2
: 0 , lCurrentPosition));
                            // now we are sure - it was
                            wasSign = true;
                            // Append the parenthesized expression
                            // Note: if a parenthesized expression of the requested
                            // index is not available "null" is added to the string
                            ret.append(getParen(c - '0'));
                            lLastPosition = lCurrentPosition;
                        }
                    }

                    // Move forward, skipping past match
                    lCurrentPosition++;
                }

                // Append everything after the last $ sign
                ret.append(substitution.substring(wasSign ? lLastPosition + 2 : 0,lLength));
[...]
--

Thanks for reading. :)
       


-- 
Best regards,
 Дмитрий                          mailto:dimmik@teztour.com


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-dev-help@jakarta.apache.org


Mime
View raw message