<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>oro-user@jakarta.apache.org Archives</title>
<link rel="self" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/?format=atom"/>
<link href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/"/>
<id>http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/</id>
<updated>2009-12-06T05:27:18Z</updated>
<entry>
<title>Perl5Util.subsitute() and ${X} symbolic references in the substitution pattern</title>
<author><name>Brian Dantes &lt;bldantes@comcast.net&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200912.mbox/%3cC73E01E3.1DDFD%25bldantes@comcast.net%3e"/>
<id>urn:uuid:%3cC73E01E3-1DDFD%25bldantes@comcast-net%3e</id>
<updated>2009-12-04T08:01:39Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
In an actual Perl program, I can write a regex substitution like:

$x =~ s/abc(def)ghi(jkl)/${1}123${2}/;

This doesn't work with ORO. Instead I have to:

util.substitute("s/abc(def)ghi(jkl)/$1\\123$2/");

I am using ORO to read in legacy Perl regexes from an external source, so
getting it to work with the symbolic references would be best.

Is there a way?

-BD



---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: ArrayIndexOutOfBoundsException when invoking matcher.contains(str, pattern)</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200910.mbox/%3c200910271700.n9RH0o1v000480@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200910271700-n9RH0o1v000480@aragorn-savarese-org%3e</id>
<updated>2009-10-27T17:00:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;022508A4AD8E9549B01A6ED7084A7D752783649F52@mx01.ic-consult.de&gt;, Vil
mantas Baranauskas writes:
&gt;I was using single unsynchronized Perl5Matcher instance and javadoc tells t=
&gt;o use single instance per thread or synchronize.

In addition, if you share a Perl5Pattern instance between Perl5Matcher
instances in different threads, be sure to compile the pattern with
Perl5Compiler.READ_ONLY_MASK.

daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: ArrayIndexOutOfBoundsException when invoking matcher.contains(str, pattern)</title>
<author><name>Vilmantas Baranauskas &lt;vilmantas.baranauskas@ic-consult.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200910.mbox/%3c022508A4AD8E9549B01A6ED7084A7D752783649F52@mx01.ic-consult.de%3e"/>
<id>urn:uuid:%3c022508A4AD8E9549B01A6ED7084A7D752783649F52@mx01-ic-consult-de%3e</id>
<updated>2009-10-27T15:14:43Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Looks like this could be a synchronization problem. 

I was using single unsynchronized Perl5Matcher instance and javadoc tells to use single instance
per thread or synchronize.

Sorry for disturbing and thanks,
Vilmantas

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>ArrayIndexOutOfBoundsException when invoking matcher.contains(str, pattern)</title>
<author><name>Vilmantas Baranauskas &lt;vilmantas.baranauskas@ic-consult.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200910.mbox/%3c022508A4AD8E9549B01A6ED7084A7D752783649F51@mx01.ic-consult.de%3e"/>
<id>urn:uuid:%3c022508A4AD8E9549B01A6ED7084A7D752783649F51@mx01-ic-consult-de%3e</id>
<updated>2009-10-27T15:06:32Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,

I've got following exception when invoking matcher.contains(str, pattern) method.

java.lang.ArrayIndexOutOfBoundsException: 14
	at org.apache.oro.text.regex.OpCode._getNextOffset(Unknown Source)
	at org.apache.oro.text.regex.OpCode._getNext(Unknown Source)
	at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown Source)
	at org.apache.oro.text.regex.Perl5Matcher.__tryExpression(Unknown Source)
	at org.apache.oro.text.regex.Perl5Matcher.__interpret(Unknown Source)
	at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)
	at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown Source)


Matching pattern is either "........" or "^........$" (i don't know which). Unfortunately
I haven't got string value.

Best Regards,
Vilmantas

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Is this a bug with oro?</title>
<author><name>&quot;Kevin Markey&quot; &lt;kmarkey@silvercreeksystems.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200904.mbox/%3cACEAD257B132354FAB8210F67F619A9F01D62867@scswhq.headquarters.silvercreeksystems.com%3e"/>
<id>urn:uuid:%3cACEAD257B132354FAB8210F67F619A9F01D62867@scswhq-headquarters-silvercreeksystems-com%3e</id>
<updated>2009-04-02T13:50:03Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I've taken the liberty of reconstructing your WHOLE regex.  Here it is...

(([a-zA-Z0-9_\\-]+)(\\s*=\\s*(\"(.*?)\"|'(.*?)'|([^'\"&gt;\\s]+)))?)

As predicted, one of your main groups is optional.  (I can't recall ALL the rules for Oro's
numbering of nested parentheses.  Like Perl, it's dynamic, and depends on the existence of
which groups are recognized.  I avoid such complexities.  You should, too.  See below.)

I suggest you insert this in Perl and run your input through it.  Print out everything that
the whole thing recognizes.  Also print and label each group.  It will show that different
numbers of groups are recognized and that YOUR expectations of which group is which are NOT
what IT expects!!!  Your expression is too complicated for you (or me) to debug.  There are
up to 7 capturing groups here!!!

Suggestion.  Use the NONCAPTURING GROUP when you don't need to capture or when optional. 
I.e., (?:pattern).  And make each group you intend to capture (and number) stuff REQUIRED.
 If there are two overall patterns you need to test, test them independently.  

2nd suggestion.  Use an open source HTML parser instead.  They've already solved this problem.

Final conclusion:  There ain't a bug in Oro.  The bug is in your logic.

Enjoy.
Kevin

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Thu 4/2/2009 4:14 AM
To: Kevin Markey; oro-user@jakarta.apache.org
Subject: RE: Is this a bug with oro?
 
Hi Kevin,
 
Apologize for missing the diTag pattern. Here it is,
 
 private static String start  = "&lt;" ;
 private static String tagNames =
"(form|input\\s+|head|/?select\\s+|option\\s+|textarea\\s+" + 
 "|checkboxgroup\\s+|radiogroup|/?optintrue){1}" ;
 private static String anything = "([^&gt;]*)" ;
 private static String end   = "[/]*&gt;" ;
 private static Pattern  diTag;
 private static String attribute = "[a-zA-Z0-9_\\-]+" ;
 private static String optWS  = "\\s*" ;
 private static String dquoted  = "\"(.*?)\"" ;
 private static String squoted  = "'(.*?)'" ;
 private static String plain  = "([^'\"&gt;\\s]+)" ;
 private static Pattern  nvps ;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;

   diTag = compiler.compile( start + tagNames + anything + end ,
     Perl5Compiler.CASE_INSENSITIVE_MASK ) ;

   nvps = compiler.compile( "((" + attribute + ")" + "(" + optWS
     + "=" + optWS + "(" + dquoted
     + "|" + squoted + "|" + plain
     + "))?)" ); 

 
The different scenarios for failure that you have mentioned, should fail
consistently(for the same input). correct?
In this case, for the same input the NPE occurs only occassionally. Here the
input is a HTML file read over http. Do you think, the NPE can occur when
the HTML is not available for some reason(network issue, etc..)?
 
Thanks,
Balaji Prabhakaran 
  _____  

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com] 
Sent: Tuesday, March 31, 2009 11:31 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Cc: Kevin Markey
Subject: RE: Is this a bug with oro?



One more thing to do for your diagnostics.  Do these so you can identify
where in __setLastMatchResult() you fail.

- Get the source, recompile the jar with debugging information so you get
the line number.
- Turn off any obfuscation.

Also provide the diTag pattern that is used when this fails.  (I don't see
it defined in your snippet.)  That is key. 

Still, I have a hunch...  The regex apparently has 2 groups.  I predict your
pattern allows a match **without** matching the groups.  As result,
__originalInput is reset to null at the conclusion of __setLastMatchResult()
after matching the 1st group, setting off the NPE the next iteration of your
WHILE loop, or the __beginGroupOffset or __endGroupOffset or
__endMatchOffsets arrays might be null.  I'm not totally familiar with the
source code, but I've used it for several years, and these are the things
that typically fail.

B.t.w., 2.0.6 and 2.0.8 are not substantially different in these regards.

So, make sure that BOTH groups are required in your regex.

Kevin

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 9:09 AM
To: oro-user@jakarta.apache.org
Subject: RE: Is this a bug with oro?

Hi Kevin,

Thanks a lot for your reply. Highly appreciate your help. Here are required
details.

The version is 2.0.8

The context is this.. trying to read a html file over http and parse values
of some hidden attributes in the html form.

Here is the code.. the exception occurs at the line marked below. Occurs
randomly and is not reproducable at will.
The string passed to contains() is never null and is always checked for true
before calling getMatch(). Please check if Iam missing something.

******************class that contains the code that throws the
exception************
public class Parser
{

 private static Pattern  diTag;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;

 public static void initialize(){
  .
  .
  .
 }
 public Parser( StringBuffer input)
 {
  this.input = input ;
 }
 public Vector parse()
 {
  Vector returnValue=null;
  PatternMatcherInput patternMatcherInput = new
PatternMatcherInput(input.toString());
  int previous = 0 ;
  while(primaryMatcher.contains(patternMatcherInput,diTag))
  {
   MatchResult result = primaryMatcher.getMatch();  //exception is thrown
here....
   String dataString =
input.substring(previous,patternMatcherInput.getMatchBeginOffset());
   String tag = result.group(1);
   String inputS = result.group(2);
   try
   {
    returnValue=processDITag( tag.toUpperCase(),inputS ) ;
    previous = patternMatcherInput.getCurrentOffset() ;
   }
   catch(NotHandledException nh)
   {
    previous = patternMatcherInput.getMatchBeginOffset() ;
   }
  }
  return returnValue;
 }

 public Vector processDITag( String tag, String inputString ) throws
NotHandledException
 {
  .
  .
  .
 }
}

 
******************code that calls the method in the above
class*******************************
  diHTML = readInputFile(queryParametersBean.getSurveyName()); //reads the
data from a html file over http
 
      
  if(diHTML.length()==0)
  {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not available" + sHtmlPath);  
   sFileName=ConfigBean.getProperty(sSerPathFileName); // replace with exact
file name
   sFileName=sFilePath + sFileName;
   queryParametersBean.setSurveyName(sFileName);
   diHTML = readInputFile(queryParametersBean.getSurveyName());
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name from config file" + sFileName);  
  }
  if(diHTML.length()==0) {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not in akamai server");  
  }
  else {
   if(!( queryParametersBean.getEmail() != null &amp;&amp;
queryParametersBean.getEmail().length() != 0 &amp;&amp;

(ProcessorSupport.validateEmailAddress(queryParametersBean.getEmail())==fals
e) &amp;&amp; diHTML.length() !=0))
         { 
    LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest
req)","queryParametersBean track page load " +
queryParametersBean.getEmail());   
    System.out.println("inside load event");
    Parser myParser = new Parser(diHTML, queryParameters) ;
    Vector resultString=myParser.parse();
    Iterator itrelements=resultString.iterator();
    .
    .
    .
        }
    }
****************************************************************************
*********************************

Thanks,
Balaji Prabhakaran  _____ 

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com]
Sent: Tuesday, March 31, 2009 6:48 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Subject: RE: Is this a bug with oro?



Some context and code in which this fails and data with which this fails
would help.
Also the version you are using would help.

However, inspecting 2.0.6 code (which is the most handy on the machine I'm
on -- I suspect other code is similar),
there is only one place in __setLastMatchResult() where you can get a NPE.
__lastMatchResult is non-null.  OpCode is non-null.  However,
__originalInput MIGHT be null.  Hence you can get a NPE where the
__originalInput.length is tested.  Check your code whether the string in
contains() is null, and always check if the result is true.

E.g.,

private PatternCompiler m_compiler = new Perl5Compiler();
private PatternMatcher m_matcher = new Perl5Matcher();
private Pattern m_commentRegex = m_compiler.compile ( "#" );

/** Extract comment from string. */
public String findComment ( String s )
{
   if ( s == null ) return null;
   if ( m_matcher.contains ( s, m_commentRegex ) )
   {
      MatchResult result = m_matcher.getMatch();
      String comment = s.substring ( result.endOffset(0) );
      return comment;
   }
   return null;
}

Enjoy.
Kevin Markey

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 6:22 AM
To: oro-user@jakarta.apache.org
Subject: Is this a bug with oro?

Hello,

I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.

Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)

Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.

I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?

Any help will be greatly appreciated.

Thanks,
Balaji Prabhakaran










</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Is this a bug with oro?</title>
<author><name>&quot;Balaji&quot; &lt;balaji.prabakaran@listertechnologies.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200904.mbox/%3c20090402101728.4F9347248B7@athena.apache.org%3e"/>
<id>urn:uuid:%3c20090402101728-4F9347248B7@athena-apache-org%3e</id>
<updated>2009-04-02T10:14:17Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Kevin,
 
Apologize for missing the diTag pattern. Here it is,
 
 private static String start  = "&lt;" ;
 private static String tagNames =
"(form|input\\s+|head|/?select\\s+|option\\s+|textarea\\s+" + 
 "|checkboxgroup\\s+|radiogroup|/?optintrue){1}" ;
 private static String anything = "([^&gt;]*)" ;
 private static String end   = "[/]*&gt;" ;
 private static Pattern  diTag;
 private static String attribute = "[a-zA-Z0-9_\\-]+" ;
 private static String optWS  = "\\s*" ;
 private static String dquoted  = "\"(.*?)\"" ;
 private static String squoted  = "'(.*?)'" ;
 private static String plain  = "([^'\"&gt;\\s]+)" ;
 private static Pattern  nvps ;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;

   diTag = compiler.compile( start + tagNames + anything + end ,
     Perl5Compiler.CASE_INSENSITIVE_MASK ) ;

   nvps = compiler.compile( "((" + attribute + ")" + "(" + optWS
     + "=" + optWS + "(" + dquoted
     + "|" + squoted + "|" + plain
     + "))?)" ); 

 
The different scenarios for failure that you have mentioned, should fail
consistently(for the same input). correct?
In this case, for the same input the NPE occurs only occassionally. Here the
input is a HTML file read over http. Do you think, the NPE can occur when
the HTML is not available for some reason(network issue, etc..)?
 
Thanks,
Balaji Prabhakaran 
  _____  

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com] 
Sent: Tuesday, March 31, 2009 11:31 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Cc: Kevin Markey
Subject: RE: Is this a bug with oro?



One more thing to do for your diagnostics.  Do these so you can identify
where in __setLastMatchResult() you fail.

- Get the source, recompile the jar with debugging information so you get
the line number.
- Turn off any obfuscation.

Also provide the diTag pattern that is used when this fails.  (I don't see
it defined in your snippet.)  That is key. 

Still, I have a hunch...  The regex apparently has 2 groups.  I predict your
pattern allows a match **without** matching the groups.  As result,
__originalInput is reset to null at the conclusion of __setLastMatchResult()
after matching the 1st group, setting off the NPE the next iteration of your
WHILE loop, or the __beginGroupOffset or __endGroupOffset or
__endMatchOffsets arrays might be null.  I'm not totally familiar with the
source code, but I've used it for several years, and these are the things
that typically fail.

B.t.w., 2.0.6 and 2.0.8 are not substantially different in these regards.

So, make sure that BOTH groups are required in your regex.

Kevin

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 9:09 AM
To: oro-user@jakarta.apache.org
Subject: RE: Is this a bug with oro?

Hi Kevin,

Thanks a lot for your reply. Highly appreciate your help. Here are required
details.

The version is 2.0.8

The context is this.. trying to read a html file over http and parse values
of some hidden attributes in the html form.

Here is the code.. the exception occurs at the line marked below. Occurs
randomly and is not reproducable at will.
The string passed to contains() is never null and is always checked for true
before calling getMatch(). Please check if Iam missing something.

******************class that contains the code that throws the
exception************
public class Parser
{

 private static Pattern  diTag;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;

 public static void initialize(){
  .
  .
  .
 }
 public Parser( StringBuffer input)
 {
  this.input = input ;
 }
 public Vector parse()
 {
  Vector returnValue=null;
  PatternMatcherInput patternMatcherInput = new
PatternMatcherInput(input.toString());
  int previous = 0 ;
  while(primaryMatcher.contains(patternMatcherInput,diTag))
  {
   MatchResult result = primaryMatcher.getMatch();  //exception is thrown
here....
   String dataString =
input.substring(previous,patternMatcherInput.getMatchBeginOffset());
   String tag = result.group(1);
   String inputS = result.group(2);
   try
   {
    returnValue=processDITag( tag.toUpperCase(),inputS ) ;
    previous = patternMatcherInput.getCurrentOffset() ;
   }
   catch(NotHandledException nh)
   {
    previous = patternMatcherInput.getMatchBeginOffset() ;
   }
  }
  return returnValue;
 }

 public Vector processDITag( String tag, String inputString ) throws
NotHandledException
 {
  .
  .
  .
 }
}

 
******************code that calls the method in the above
class*******************************
  diHTML = readInputFile(queryParametersBean.getSurveyName()); //reads the
data from a html file over http
 
      
  if(diHTML.length()==0)
  {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not available" + sHtmlPath);  
   sFileName=ConfigBean.getProperty(sSerPathFileName); // replace with exact
file name
   sFileName=sFilePath + sFileName;
   queryParametersBean.setSurveyName(sFileName);
   diHTML = readInputFile(queryParametersBean.getSurveyName());
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name from config file" + sFileName);  
  }
  if(diHTML.length()==0) {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not in akamai server");  
  }
  else {
   if(!( queryParametersBean.getEmail() != null &amp;&amp;
queryParametersBean.getEmail().length() != 0 &amp;&amp;

(ProcessorSupport.validateEmailAddress(queryParametersBean.getEmail())==fals
e) &amp;&amp; diHTML.length() !=0))
         { 
    LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest
req)","queryParametersBean track page load " +
queryParametersBean.getEmail());   
    System.out.println("inside load event");
    Parser myParser = new Parser(diHTML, queryParameters) ;
    Vector resultString=myParser.parse();
    Iterator itrelements=resultString.iterator();
    .
    .
    .
        }
    }
****************************************************************************
*********************************

Thanks,
Balaji Prabhakaran  _____ 

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com]
Sent: Tuesday, March 31, 2009 6:48 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Subject: RE: Is this a bug with oro?



Some context and code in which this fails and data with which this fails
would help.
Also the version you are using would help.

However, inspecting 2.0.6 code (which is the most handy on the machine I'm
on -- I suspect other code is similar),
there is only one place in __setLastMatchResult() where you can get a NPE.
__lastMatchResult is non-null.  OpCode is non-null.  However,
__originalInput MIGHT be null.  Hence you can get a NPE where the
__originalInput.length is tested.  Check your code whether the string in
contains() is null, and always check if the result is true.

E.g.,

private PatternCompiler m_compiler = new Perl5Compiler();
private PatternMatcher m_matcher = new Perl5Matcher();
private Pattern m_commentRegex = m_compiler.compile ( "#" );

/** Extract comment from string. */
public String findComment ( String s )
{
   if ( s == null ) return null;
   if ( m_matcher.contains ( s, m_commentRegex ) )
   {
      MatchResult result = m_matcher.getMatch();
      String comment = s.substring ( result.endOffset(0) );
      return comment;
   }
   return null;
}

Enjoy.
Kevin Markey

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 6:22 AM
To: oro-user@jakarta.apache.org
Subject: Is this a bug with oro?

Hello,

I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.

Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)

Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.

I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?

Any help will be greatly appreciated.

Thanks,
Balaji Prabhakaran









</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Is this a bug with oro?</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200903.mbox/%3c200903311924.n2VJOGaj021180@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200903311924-n2VJOGaj021180@aragorn-savarese-org%3e</id>
<updated>2009-03-31T19:24:16Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;20090331122526.DAF197248BB@athena.apache.org&gt;, "Balaji" writes:
&gt;I occassionally get the below exception. The call to getMatch is causing a
&gt;NullPointerException.

Can you post a minimal working example that we can compile and run to
reproduce the problem?  Without that, it sounds like the calling code
is violating a precondition.

daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Is this a bug with oro?</title>
<author><name>&quot;Kevin Markey&quot; &lt;kmarkey@silvercreeksystems.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200903.mbox/%3cACEAD257B132354FAB8210F67F619A9F01D62861@scswhq.headquarters.silvercreeksystems.com%3e"/>
<id>urn:uuid:%3cACEAD257B132354FAB8210F67F619A9F01D62861@scswhq-headquarters-silvercreeksystems-com%3e</id>
<updated>2009-03-31T18:01:17Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
One more thing to do for your diagnostics.  Do these so you can identify where in __setLastMatchResult()
you fail.

- Get the source, recompile the jar with debugging information so you get the line number.
- Turn off any obfuscation.

Also provide the diTag pattern that is used when this fails.  (I don't see it defined in your
snippet.)  That is key.  

Still, I have a hunch...  The regex apparently has 2 groups.  I predict your pattern allows
a match **without** matching the groups.  As result, __originalInput is reset to null at the
conclusion of __setLastMatchResult() after matching the 1st group, setting off the NPE the
next iteration of your WHILE loop, or the __beginGroupOffset or __endGroupOffset or __endMatchOffsets
arrays might be null.  I'm not totally familiar with the source code, but I've used it for
several years, and these are the things that typically fail.

B.t.w., 2.0.6 and 2.0.8 are not substantially different in these regards.

So, make sure that BOTH groups are required in your regex.

Kevin

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 9:09 AM
To: oro-user@jakarta.apache.org
Subject: RE: Is this a bug with oro?
 
Hi Kevin,
 
Thanks a lot for your reply. Highly appreciate your help. Here are required
details.
 
The version is 2.0.8
 
The context is this.. trying to read a html file over http and parse values
of some hidden attributes in the html form.
 
Here is the code.. the exception occurs at the line marked below. Occurs
randomly and is not reproducable at will. 
The string passed to contains() is never null and is always checked for true
before calling getMatch(). Please check if Iam missing something.
 
******************class that contains the code that throws the
exception************
public class Parser
{
 
 private static Pattern  diTag;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;
 
 public static void initialize(){
  .
  .
  .
 }
 public Parser( StringBuffer input)
 {
  this.input = input ;
 }
 public Vector parse()
 {
  Vector returnValue=null;
  PatternMatcherInput patternMatcherInput = new
PatternMatcherInput(input.toString());
  int previous = 0 ;
  while(primaryMatcher.contains(patternMatcherInput,diTag))
  {
   MatchResult result = primaryMatcher.getMatch();  //exception is thrown
here....
   String dataString =
input.substring(previous,patternMatcherInput.getMatchBeginOffset());
   String tag = result.group(1);
   String inputS = result.group(2);
   try
   {
    returnValue=processDITag( tag.toUpperCase(),inputS ) ;
    previous = patternMatcherInput.getCurrentOffset() ;
   }
   catch(NotHandledException nh)
   {
    previous = patternMatcherInput.getMatchBeginOffset() ;
   }
  }
  return returnValue;
 }
 
 public Vector processDITag( String tag, String inputString ) throws
NotHandledException
 {
  .
  .
  .
 }
}
 
  
******************code that calls the method in the above
class*******************************
  diHTML = readInputFile(queryParametersBean.getSurveyName()); //reads the
data from a html file over http
  
       
  if(diHTML.length()==0)
  {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not available" + sHtmlPath);   
   sFileName=ConfigBean.getProperty(sSerPathFileName); // replace with exact
file name
   sFileName=sFilePath + sFileName;
   queryParametersBean.setSurveyName(sFileName);
   diHTML = readInputFile(queryParametersBean.getSurveyName());
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name from config file" + sFileName);   
  }
  if(diHTML.length()==0) {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not in akamai server");   
  }
  else {
   if(!( queryParametersBean.getEmail() != null &amp;&amp;
queryParametersBean.getEmail().length() != 0 &amp;&amp;
 
(ProcessorSupport.validateEmailAddress(queryParametersBean.getEmail())==fals
e) &amp;&amp; diHTML.length() !=0))
         {  
    LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest
req)","queryParametersBean track page load " +
queryParametersBean.getEmail());    
    System.out.println("inside load event");
    Parser myParser = new Parser(diHTML, queryParameters) ;
    Vector resultString=myParser.parse();
    Iterator itrelements=resultString.iterator();
    .
    .
    .
        }
    }
****************************************************************************
*********************************
 
Thanks,
Balaji Prabhakaran | Team Lead | Lister Technologies P Ltd
&lt;http://www.listertechnologies.com/&gt;  | AIM: BalajeeSP | direct:
1.352.553.4238 | office: +91.44.4225 2876 | cell: +91.98410.14404



DISCLAIMER:This email message and the files transmitted with it are for the
sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original
message. 

 

  _____  

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com] 
Sent: Tuesday, March 31, 2009 6:48 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Subject: RE: Is this a bug with oro?



Some context and code in which this fails and data with which this fails
would help.
Also the version you are using would help.

However, inspecting 2.0.6 code (which is the most handy on the machine I'm
on -- I suspect other code is similar),
there is only one place in __setLastMatchResult() where you can get a NPE.
__lastMatchResult is non-null.  OpCode is non-null.  However,
__originalInput MIGHT be null.  Hence you can get a NPE where the
__originalInput.length is tested.  Check your code whether the string in
contains() is null, and always check if the result is true.

E.g.,

private PatternCompiler m_compiler = new Perl5Compiler();
private PatternMatcher m_matcher = new Perl5Matcher();
private Pattern m_commentRegex = m_compiler.compile ( "#" );

/** Extract comment from string. */
public String findComment ( String s )
{
   if ( s == null ) return null;
   if ( m_matcher.contains ( s, m_commentRegex ) )
   {
      MatchResult result = m_matcher.getMatch();
      String comment = s.substring ( result.endOffset(0) );
      return comment;
   }
   return null;
}

Enjoy.
Kevin Markey

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 6:22 AM
To: oro-user@jakarta.apache.org
Subject: Is this a bug with oro?

Hello,

I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.

Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)

Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.

I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?

Any help will be greatly appreciated.

Thanks,
Balaji Prabhakaran







</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Is this a bug with oro?</title>
<author><name>&quot;Balaji&quot; &lt;balaji.prabakaran@listertechnologies.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200903.mbox/%3c20090331151313.AAED2724890@athena.apache.org%3e"/>
<id>urn:uuid:%3c20090331151313-AAED2724890@athena-apache-org%3e</id>
<updated>2009-03-31T15:09:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Kevin,
 
Thanks a lot for your reply. Highly appreciate your help. Here are required
details.
 
The version is 2.0.8
 
The context is this.. trying to read a html file over http and parse values
of some hidden attributes in the html form.
 
Here is the code.. the exception occurs at the line marked below. Occurs
randomly and is not reproducable at will. 
The string passed to contains() is never null and is always checked for true
before calling getMatch(). Please check if Iam missing something.
 
******************class that contains the code that throws the
exception************
public class Parser
{
 
 private static Pattern  diTag;
 private static PatternMatcher primaryMatcher = new Perl5Matcher() ;
 private static PatternCompiler compiler = new Perl5Compiler() ;
 
 public static void initialize(){
  .
  .
  .
 }
 public Parser( StringBuffer input)
 {
  this.input = input ;
 }
 public Vector parse()
 {
  Vector returnValue=null;
  PatternMatcherInput patternMatcherInput = new
PatternMatcherInput(input.toString());
  int previous = 0 ;
  while(primaryMatcher.contains(patternMatcherInput,diTag))
  {
   MatchResult result = primaryMatcher.getMatch();  //exception is thrown
here....
   String dataString =
input.substring(previous,patternMatcherInput.getMatchBeginOffset());
   String tag = result.group(1);
   String inputS = result.group(2);
   try
   {
    returnValue=processDITag( tag.toUpperCase(),inputS ) ;
    previous = patternMatcherInput.getCurrentOffset() ;
   }
   catch(NotHandledException nh)
   {
    previous = patternMatcherInput.getMatchBeginOffset() ;
   }
  }
  return returnValue;
 }
 
 public Vector processDITag( String tag, String inputString ) throws
NotHandledException
 {
  .
  .
  .
 }
}
 
  
******************code that calls the method in the above
class*******************************
  diHTML = readInputFile(queryParametersBean.getSurveyName()); //reads the
data from a html file over http
  
       
  if(diHTML.length()==0)
  {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not available" + sHtmlPath);   
   sFileName=ConfigBean.getProperty(sSerPathFileName); // replace with exact
file name
   sFileName=sFilePath + sFileName;
   queryParametersBean.setSurveyName(sFileName);
   diHTML = readInputFile(queryParametersBean.getSurveyName());
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name from config file" + sFileName);   
  }
  if(diHTML.length()==0) {
   LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest req)","The
file name is not in akamai server");   
  }
  else {
   if(!( queryParametersBean.getEmail() != null &amp;&amp;
queryParametersBean.getEmail().length() != 0 &amp;&amp;
 
(ProcessorSupport.validateEmailAddress(queryParametersBean.getEmail())==fals
e) &amp;&amp; diHTML.length() !=0))
         {  
    LogWriter.info(CLASS_NAME,"loadPageEvent(HttpServletRequest
req)","queryParametersBean track page load " +
queryParametersBean.getEmail());    
    System.out.println("inside load event");
    Parser myParser = new Parser(diHTML, queryParameters) ;
    Vector resultString=myParser.parse();
    Iterator itrelements=resultString.iterator();
    .
    .
    .
        }
    }
****************************************************************************
*********************************
 
Thanks,
Balaji Prabhakaran | Team Lead | Lister Technologies P Ltd
&lt;http://www.listertechnologies.com/&gt;  | AIM: BalajeeSP | direct:
1.352.553.4238 | office: +91.44.4225 2876 | cell: +91.98410.14404



DISCLAIMER:This email message and the files transmitted with it are for the
sole use of the intended recipient(s) and may contain confidential and
privileged information. Any unauthorized review, use, disclosure or
distribution is prohibited. If you are not the intended recipient, please
contact the sender by reply email and destroy all copies of the original
message. 

 

  _____  

From: Kevin Markey [mailto:kmarkey@silvercreeksystems.com] 
Sent: Tuesday, March 31, 2009 6:48 PM
To: ORO Users List; oro-user@jakarta.apache.org;
balaji.prabakaran@listertechnologies.com
Subject: RE: Is this a bug with oro?



Some context and code in which this fails and data with which this fails
would help.
Also the version you are using would help.

However, inspecting 2.0.6 code (which is the most handy on the machine I'm
on -- I suspect other code is similar),
there is only one place in __setLastMatchResult() where you can get a NPE.
__lastMatchResult is non-null.  OpCode is non-null.  However,
__originalInput MIGHT be null.  Hence you can get a NPE where the
__originalInput.length is tested.  Check your code whether the string in
contains() is null, and always check if the result is true.

E.g.,

private PatternCompiler m_compiler = new Perl5Compiler();
private PatternMatcher m_matcher = new Perl5Matcher();
private Pattern m_commentRegex = m_compiler.compile ( "#" );

/** Extract comment from string. */
public String findComment ( String s )
{
   if ( s == null ) return null;
   if ( m_matcher.contains ( s, m_commentRegex ) )
   {
      MatchResult result = m_matcher.getMatch();
      String comment = s.substring ( result.endOffset(0) );
      return comment;
   }
   return null;
}

Enjoy.
Kevin Markey

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 6:22 AM
To: oro-user@jakarta.apache.org
Subject: Is this a bug with oro?

Hello,

I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.

Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)

Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.

I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?

Any help will be greatly appreciated.

Thanks,
Balaji Prabhakaran






</pre>
</div>
</content>
</entry>
<entry>
<title>RE: Is this a bug with oro?</title>
<author><name>&quot;Kevin Markey&quot; &lt;kmarkey@silvercreeksystems.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200903.mbox/%3cACEAD257B132354FAB8210F67F619A9F01D6285E@scswhq.headquarters.silvercreeksystems.com%3e"/>
<id>urn:uuid:%3cACEAD257B132354FAB8210F67F619A9F01D6285E@scswhq-headquarters-silvercreeksystems-com%3e</id>
<updated>2009-03-31T13:17:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Some context and code in which this fails and data with which this fails would help.
Also the version you are using would help.

However, inspecting 2.0.6 code (which is the most handy on the machine I'm on -- I suspect
other code is similar), 
there is only one place in __setLastMatchResult() where you can get a NPE.  __lastMatchResult
is non-null.  OpCode is non-null.  However, __originalInput MIGHT be null.  Hence you can
get a NPE where the __originalInput.length is tested.  Check your code whether the string
in contains() is null, and always check if the result is true.

E.g.,

private PatternCompiler m_compiler = new Perl5Compiler();
private PatternMatcher m_matcher = new Perl5Matcher();
private Pattern m_commentRegex = m_compiler.compile ( "#" );

/** Extract comment from string. */
public String findComment ( String s )
{
   if ( s == null ) return null;
   if ( m_matcher.contains ( s, m_commentRegex ) )
   {
      MatchResult result = m_matcher.getMatch();
      String comment = s.substring ( result.endOffset(0) );
      return comment;
   }
   return null;
}

Enjoy.
Kevin Markey

-----Original Message-----
From: Balaji [mailto:balaji.prabakaran@listertechnologies.com]
Sent: Tue 3/31/2009 6:22 AM
To: oro-user@jakarta.apache.org
Subject: Is this a bug with oro?
 
Hello,
 
I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.
 
Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)
 
Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.
 
I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?
 
Any help will be greatly appreciated.
 
Thanks,
Balaji Prabhakaran




</pre>
</div>
</content>
</entry>
<entry>
<title>Is this a bug with oro?</title>
<author><name>&quot;Balaji&quot; &lt;balaji.prabakaran@listertechnologies.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200903.mbox/%3c20090331122526.DAF197248BB@athena.apache.org%3e"/>
<id>urn:uuid:%3c20090331122526-DAF197248BB@athena-apache-org%3e</id>
<updated>2009-03-31T12:22:21Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hello,
 
I occassionally get the below exception. The call to getMatch is causing a
NullPointerException.
 
Caused by: java.lang.NullPointerException
    at org.apache.oro.text.regex.Perl5Matcher.__setLastMatchResult(Unknown
Source)
    at org.apache.oro.text.regex.Perl5Matcher.getMatch(Unknown Source)
 
Here is what the API documentation says,
A MatchResult instance containing the pattern match found by the last call
to any one of the matches() or contains() methods. If no match was found by
the last call, returns null.
 
I believe this is a bug. Can you guys, please confirm?
If so, is there a fix or a workaround for this bug?
 
Any help will be greatly appreciated.
 
Thanks,
Balaji Prabhakaran


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: regular expression matching question (matching 'bla/bla')</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3c200901300144.n0U1iHsi027552@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200901300144-n0U1iHsi027552@aragorn-savarese-org%3e</id>
<updated>2009-01-30T01:44:17Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;21738979.post@talk.nabble.com&gt;, aldana writes:
&gt;i need to match a pattern like bla/bla (i.e. the bla is repeated after the
&gt;slash).

Using ORO's Perl5 classes, the following will suffice: (bla)/$1



---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>regular expression matching question (matching 'bla/bla')</title>
<author><name>aldana &lt;aldana@gmx.de&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3c21738979.post@talk.nabble.com%3e"/>
<id>urn:uuid:%3c21738979-post@talk-nabble-com%3e</id>
<updated>2009-01-29T23:59:29Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

hi,

i need to match a pattern like bla/bla (i.e. the bla is repeated after the
slash).

I guess with regular expressions this is not possible because they are
regular (level 3 in chomsky hierachy) and "have no memory" about data parsed
before.

-----
manuel aldana
aldana((at))gmx.de
software-engineering blog: http://www.aldana-online.de
-- 
View this message in context: http://www.nabble.com/regular-expression-matching-question-%28matching-%27bla-bla%27%29-tp21738979p21738979.html
Sent from the ORO - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: pattern and infinite loop</title>
<author><name>no spam &lt;mrs.nospam@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3cbd818c7b0901191839k686a3268i9c964e2687ff3fa7@mail.gmail.com%3e"/>
<id>urn:uuid:%3cbd818c7b0901191839k686a3268i9c964e2687ff3fa7@mail-gmail-com%3e</id>
<updated>2009-01-20T02:39:11Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm using contains.  Strange .. not sure what's going on.

&gt; Are you using contains() or match()?  If you're using match(), then
&gt; switch to contains() and it should work.  Here's my sanity check for
&gt; the pattern (to avoid having to write a Java test program):
&gt;
&gt; ~&gt; wget -O - http://www.myspace.com/pain  2&gt; /dev/null | perl -e '@txt =
&gt; &lt;STDIN&gt;; $txt = join("", @txt); $txt =~
&gt; m#&lt;span\s+class="nametext"&gt;[^&lt;]*&lt;/span&gt;&lt;br&gt;[^&lt;]*&lt;font\s[^&gt;]*&gt;&lt;strong&gt;([^&lt;]+)&lt;/strong&gt;&lt;/font&gt;#si;
&gt; print "$1\n";'
&gt;
&gt;  Metal / Industrial
&gt;

Ah yes I figured that was the issue after I saw your pattern.   The bits I
don't understand though is how [^&lt;]* is working.  What exactly does that
part of the pattern mean?

In any case, the key to prevent excessive backtracking is to make the
&gt; pattern as specific as possible.  The original pattern posed problems
&gt; because of the leading .* as well as following .+ pattern elements which
&gt; caused a lot of backtracking.
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: pattern and infinite loop</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3c200901191801.n0JI1XXG027735@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200901191801-n0JI1XXG027735@aragorn-savarese-org%3e</id>
<updated>2009-01-19T18:01:33Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;bd818c7b0901190638l7abc160ejf2e509048717c693@mail.gmail.com&gt;, no sp
am writes:
&gt;Yes that's correct.  This pattern prevented the looping it just didn't match
&gt;for that particular page.  I'll have to digest this pattern a bit more in my
&gt;head :o)

Are you using contains() or match()?  If you're using match(), then
switch to contains() and it should work.  Here's my sanity check for
the pattern (to avoid having to write a Java test program):

~&gt; wget -O - http://www.myspace.com/pain  2&gt; /dev/null | perl -e '@txt = &lt;STDIN&gt;;
$txt = join("", @txt); $txt =~ m#&lt;span\s+class="nametext"&gt;[^&lt;]*&lt;/span&gt;&lt;br&gt;[^&lt;]*&lt;font\s[^&gt;]*&gt;&lt;strong&gt;([^&lt;]+)&lt;/strong&gt;&lt;/font&gt;#si;
print "$1\n";'                                                                           
                                                            Metal / Industrial

In any case, the key to prevent excessive backtracking is to make the
pattern as specific as possible.  The original pattern posed problems
because of the leading .* as well as following .+ pattern elements which
caused a lot of backtracking.


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: pattern and infinite loop</title>
<author><name>no spam &lt;mrs.nospam@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3cbd818c7b0901190638l7abc160ejf2e509048717c693@mail.gmail.com%3e"/>
<id>urn:uuid:%3cbd818c7b0901190638l7abc160ejf2e509048717c693@mail-gmail-com%3e</id>
<updated>2009-01-19T14:38:01Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
&gt;
&gt; Presumably, you're concerned only with the capture group (containing
&gt; the genre), so rewrite the expression along the following lines to
&gt; avoid the ambiguous/excessive backtracking:
&gt;
&gt; p5c.compile("&lt;span\\s+class=\"nametext\"&gt;[^&lt;]*&lt;/span&gt;&lt;br&gt;[^&lt;]*&lt;font[^&gt;]*&gt;"+
&gt;            "&lt;strong&gt;([^&lt;]+)&lt;/strong&gt;&lt;/font&gt;",
&gt;            Perl5Compiler.SINGLELINE_MASK);
&gt;

Yes that's correct.  This pattern prevented the looping it just didn't match
for that particular page.  I'll have to digest this pattern a bit more in my
head :o)


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: pattern and infinite loop</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3c200901190506.n0J56Cug024678@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200901190506-n0J56Cug024678@aragorn-savarese-org%3e</id>
<updated>2009-01-19T05:06:12Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;bd818c7b0901182009w5063efbbofd2afc1841e47ca7@mail.gmail.com&gt;, no sp
am writes:
&gt;I'm using this pattern:
&gt;p5c.compile(".*?&lt;td\\s+.+&lt;span\\s+class=\"nametext\"&gt;"+
&gt;".+?&lt;strong&gt;(.+?)&lt;/strong&gt;&lt;/font&gt;.+?Profile\\s+Views",
&gt;Perl5Compiler.SINGLELINE_MASK);
&gt;
&gt;to try and pull genres out of myspace pages.  However some pages like this
...
&gt;How can I prevent these loops?

Presumably, you're concerned only with the capture group (containing
the genre), so rewrite the expression along the following lines to
avoid the ambiguous/excessive backtracking:

p5c.compile("&lt;span\\s+class=\"nametext\"&gt;[^&lt;]*&lt;/span&gt;&lt;br&gt;[^&lt;]*&lt;font[^&gt;]*&gt;"+
            "&lt;strong&gt;([^&lt;]+)&lt;/strong&gt;&lt;/font&gt;",
            Perl5Compiler.SINGLELINE_MASK);



---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>pattern and infinite loop</title>
<author><name>no spam &lt;mrs.nospam@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200901.mbox/%3cbd818c7b0901182009w5063efbbofd2afc1841e47ca7@mail.gmail.com%3e"/>
<id>urn:uuid:%3cbd818c7b0901182009w5063efbbofd2afc1841e47ca7@mail-gmail-com%3e</id>
<updated>2009-01-19T04:09:47Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I'm using this pattern:
p5c.compile(".*?&lt;td\\s+.+&lt;span\\s+class=\"nametext\"&gt;"+
".+?&lt;strong&gt;(.+?)&lt;/strong&gt;&lt;/font&gt;.+?Profile\\s+Views",
Perl5Compiler.SINGLELINE_MASK);

to try and pull genres out of myspace pages.  However some pages like this
result in infinite loops:

http://www.myspace.com/pain

How can I prevent these loops?


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Need help in running oro unit tests</title>
<author><name>SatheeshKumar Mohan &lt;skumar@spikesource.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200805.mbox/%3c4821FA64.2060603@spikesource.com%3e"/>
<id>urn:uuid:%3c4821FA64-2060603@spikesource-com%3e</id>
<updated>2008-05-07T18:52:20Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Daniel et al,
If I get some point to start I shall do this!!

Thanks,
skumar

Daniel F. Savarese wrote:
&gt; In message &lt;48205B45.6000601@spikesource.com&gt;, SatheeshKumar Mohan writes:
&gt;   
&gt;&gt; I am trying to run unit tests for jakarta-oro.
&gt;&gt;     
&gt; ..
&gt;   
&gt;&gt; Someone help me how to run tests.
&gt;&gt;     
&gt;
&gt; The original unit tests didn't survive the migration to Jakarta
&gt; way back when.  I'd hoped to use unit test writing as a way of
&gt; getting more developers involved with maintaining the software.
&gt; Didn't happen.
&gt;
&gt; daniel
&gt;
&gt; o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o o-o-o-o-o-o-o-o-o-o-o-o-o-o
&gt;                     Igfip                      o    s a v a r e s e
&gt; The strategic alternative for online games(tm).o   software research
&gt;             http://www.igfip.com/              o http://www.savarese.com/
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;   


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Need help in running oro unit tests</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200805.mbox/%3c200805061623.m46GNtD7007753@aragorn.savarese.org%3e"/>
<id>urn:uuid:%3c200805061623-m46GNtD7007753@aragorn-savarese-org%3e</id>
<updated>2008-05-06T16:23:55Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;48205B45.6000601@spikesource.com&gt;, SatheeshKumar Mohan writes:
&gt;I am trying to run unit tests for jakarta-oro.
..
&gt;Someone help me how to run tests.

The original unit tests didn't survive the migration to Jakarta
way back when.  I'd hoped to use unit test writing as a way of
getting more developers involved with maintaining the software.
Didn't happen.

daniel

o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o-o o-o-o-o-o-o-o-o-o-o-o-o-o-o
                    Igfip                      o    s a v a r e s e
The strategic alternative for online games(tm).o   software research
            http://www.igfip.com/              o http://www.savarese.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Need help in running oro unit tests</title>
<author><name>SatheeshKumar Mohan &lt;skumar@spikesource.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200805.mbox/%3c48205B45.6000601@spikesource.com%3e"/>
<id>urn:uuid:%3c48205B45-6000601@spikesource-com%3e</id>
<updated>2008-05-06T13:21:09Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,
I am trying to run unit tests for jakarta-oro.

I did, ant tests.
The following message is shown,

[jakarta-oro-2.0.8]sn-patch5:oss[04:09:36] ant tests
Buildfile: build.xml

prepare:

lib:

tests:

BUILD FAILED
C:\cygwin\opt\spikesource\var\tmp\portage\jakarta-oro-2.0.8-r1\work\jakarta-oro-2.0.8\build.xml:119:

srcdir 
"C:\cygwin\opt\spikesource\var\tmp\portage\jakarta-oro-2.0.8-r1\work\jakarta-oro-2.0.8\src\java\tests"

does not exist!

Total time: 0 seconds
[jakarta-oro-2.0.8]sn-patch5:oss[04:09:59]


There is not tests directory under src/java.

I have checked on svn also. Its not even there.
http://svn.apache.org/repos/asf/jakarta/oro/trunk/src/java/

Someone help me how to run tests.

/skumar

-- 
/skumar


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Taking two different statements.</title>
<author><name>Melvin Mah &lt;melvinmah@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200611.mbox/%3c915162.2107.qm@web36804.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c915162-2107-qm@web36804-mail-mud-yahoo-com%3e</id>
<updated>2006-11-20T10:40:12Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I have a list of ten results that I would like to
extract from the source page to be used in my XML
connector. 

The scenario is this ( in a page of 10 entries) :

Entries 1,2,3,5,8,9,10 have normal statements.
Entries 4,6 and 7 have statements but with keywords
highlighted in bold.

I want to get all the statements in one go, but I'm
not sure what is the best regex for this scenario. I'm
using a Perl5 regex for this.

If I am to get normal statements, I would get this
regex:

TI: &lt;/td&gt;(.*)\n([ \t]*)&lt;td
valign="top"&gt;&lt;a&gt;(.[^&gt;]*)&lt;/a&gt;&lt;/td&gt;

For statements that have bold highlighted words, it
would be: 

TI: &lt;/td&gt;(.*)\n([ \t]*)&lt;td valign="top"&gt;&lt;a&gt;([^&lt;]*)
&lt;font
color="#990000"&gt;(&lt;b&gt;([^&gt;]*)&lt;/b&gt;&lt;/font&gt;([^&gt;]*)&lt;/a&gt;)

I want to get all the statements together, so i am
wondering what is the best regexp for this?

Thanks


--- "Daniel F. Savarese" &lt;dfs@savarese.org&gt; wrote:

&gt; 
&gt; In message
&gt;
&lt;20060821100414.61356.qmail@web36808.mail.mud.yahoo.com&gt;,
&gt; Melvin Mah
&gt;  writes:
&gt; &gt;I managed to get 750,000 but I do not want the
&gt; commas
&gt; &gt;I need to remove those.
&gt; &gt;
&gt; &gt;Is there any syntax that I need to add /change to
&gt; the
&gt; &gt;current regex?
&gt; 
&gt; You're going to have to do a postprocessing
&gt; substitution pass because
&gt; you don't know exactly how many commas are going to
&gt; be in any given
&gt; piece of input.  Otherwise, you could capture the
&gt; numbers surrounding
&gt; the commas and concatenate them.
&gt; 
&gt; daniel
&gt; 
&gt; -#-#-#-#-| Sleep and The Traveller |-#-#-#-#-#-#-#-
&gt; http://www.savarese.org/
&gt; In distant lands, I hear the call of my home.     # 
&gt;    s a v a r e s e
&gt; Yet my work is not done.  My journey's just begun.- 
&gt;   software research
&gt;  -- http://www.sleepandthetraveller.com/          #
&gt; http://www.savarese.com/
&gt; 
&gt; 
&gt;
---------------------------------------------------------------------
&gt; To unsubscribe, e-mail:
&gt; oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail:
&gt; oro-user-help@jakarta.apache.org
&gt; 
&gt; 



 
____________________________________________________________________________________
Sponsored Link

Mortgage rates near 39yr lows. 
$420k for $1,399/mo. Calculate new payment! 
www.LowerMyBills.com/lre

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Unmatched / out of sequence entries</title>
<author><name>Melvin Mah &lt;melvinmah@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200608.mbox/%3c20060823033432.26598.qmail@web36810.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20060823033432-26598-qmail@web36810-mail-mud-yahoo-com%3e</id>
<updated>2006-08-23T03:34:32Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I've written a regex in which allows the search agent
to return entries matching those as in Google.

The regex is as:

&lt;p class=g&gt;&lt;!---m&gt;&lt;a class=1 href="([^"]*)"
onmousedown="([^"]*)"&gt;(.*?&lt;/a)

The problem is that some entries are not in sequential
order as in Google.

Anything that I can do to rectify the problem?


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Removing Commas</title>
<author><name>Melvin Mah &lt;melvinmah@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200608.mbox/%3c20060822013525.31268.qmail@web36811.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20060822013525-31268-qmail@web36811-mail-mud-yahoo-com%3e</id>
<updated>2006-08-22T01:35:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hey Daniel,

Just tried out putting &lt;replace var&gt; thing in the
configuration file (xml.format)and it worked out
without having to change the .jsp file for the module.

For some reason, total results of the keyword
'management' were never returned at all but not for
the others! Ha! Is it that  word is a reserved word?

melvin

--- "Daniel F. Savarese" &lt;dfs@savarese.org&gt; wrote:

&gt; 
&gt; In message
&gt;
&lt;20060821100414.61356.qmail@web36808.mail.mud.yahoo.com&gt;,
&gt; Melvin Mah
&gt;  writes:
&gt; &gt;I managed to get 750,000 but I do not want the
&gt; commas
&gt; &gt;I need to remove those.
&gt; &gt;
&gt; &gt;Is there any syntax that I need to add /change to
&gt; the
&gt; &gt;current regex?
&gt; 
&gt; You're going to have to do a postprocessing
&gt; substitution pass because
&gt; you don't know exactly how many commas are going to
&gt; be in any given
&gt; piece of input.  Otherwise, you could capture the
&gt; numbers surrounding
&gt; the commas and concatenate them.
&gt; 
&gt; daniel
&gt; 
&gt; -#-#-#-#-| Sleep and The Traveller |-#-#-#-#-#-#-#-
&gt; http://www.savarese.org/
&gt; In distant lands, I hear the call of my home.     # 
&gt;    s a v a r e s e
&gt; Yet my work is not done.  My journey's just begun.- 
&gt;   software research
&gt;  -- http://www.sleepandthetraveller.com/          #
&gt; http://www.savarese.com/
&gt; 
&gt; 
&gt;
---------------------------------------------------------------------
&gt; To unsubscribe, e-mail:
&gt; oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail:
&gt; oro-user-help@jakarta.apache.org
&gt; 
&gt; 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Removing Commas</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200608.mbox/%3c200608211609.k7LG9pko031597@gandalf.savarese.org%3e"/>
<id>urn:uuid:%3c200608211609-k7LG9pko031597@gandalf-savarese-org%3e</id>
<updated>2006-08-21T16:09:51Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;20060821100414.61356.qmail@web36808.mail.mud.yahoo.com&gt;, Melvin Mah
 writes:
&gt;I managed to get 750,000 but I do not want the commas
&gt;I need to remove those.
&gt;
&gt;Is there any syntax that I need to add /change to the
&gt;current regex?

You're going to have to do a postprocessing substitution pass because
you don't know exactly how many commas are going to be in any given
piece of input.  Otherwise, you could capture the numbers surrounding
the commas and concatenate them.

daniel

-#-#-#-#-| Sleep and The Traveller |-#-#-#-#-#-#-#- http://www.savarese.org/
In distant lands, I hear the call of my home.     #     s a v a r e s e
Yet my work is not done.  My journey's just begun.-    software research
 -- http://www.sleepandthetraveller.com/          # http://www.savarese.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Removing Commas</title>
<author><name>Melvin Mah &lt;melvinmah@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200608.mbox/%3c20060821100414.61356.qmail@web36808.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20060821100414-61356-qmail@web36808-mail-mud-yahoo-com%3e</id>
<updated>2006-08-21T10:04:14Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
I've written a regular expression in which three
numbers are returned. E.g (1-10) of 750,0000 results
found.

The regex that I've did worked and it's:

 regexp="&lt;font size=-1&gt;Results &lt;b&gt;([0-9]*)&lt;/b&gt; -
&lt;b&gt;([0-9]*)&amp;&lt;/b&gt; of about &lt;b&gt;([^&lt;]*)&lt;/b&gt;"
parenthesis="3" subgroup="3" /&gt;

I managed to get 750,000 but I do not want the commas
I need to remove those.

Is there any syntax that I need to add /change to the
current regex?

Thank you.

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>RE: to match there is no such substring</title>
<author><name>&quot;Kevin Markey&quot; &lt;kmarkey@silvercreeksystems.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3cE31E3AC363BEA843BBFD942E20B4D3F034F74D@scswhq.headquarters.silvercreeksystems.com%3e"/>
<id>urn:uuid:%3cE31E3AC363BEA843BBFD942E20B4D3F034F74D@scswhq-headquarters-silvercreeksystems-com%3e</id>
<updated>2006-06-14T18:05:37Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
The same solution applies.  Give the user a checkbox to chose to NEGATE the result.
Because of the nature of regular languages, it is easier to test a "positive" regex instead
of crafting a regex for all the "negative" instances.

Kevin Markey


-----Original Message-----
From: Ilhami Visne [mailto:ilhami.visne@gmail.com]
Sent: Wed 6/14/2006 10:23 AM
To: ORO Users List
Subject: Re: to match there is no such substring
 
actually i wrote a program, where there is a text-field for regex. users
write their regexes into this field. now, how a regex should they write to
do this?

regards

On 6/13/06, Chennamsetti, Raja &lt;Raja.Chennamsetti@marriott.com&gt; wrote:
&gt;
&gt; You can use !Matcher.contains(pattern)
&gt; regards
&gt;
&gt;
&gt; -----Original Message-----
&gt; From: Ilhami Visne [mailto:ilhami.visne@gmail.com]
&gt; Sent: Tue 6/13/2006 5:19 AM
&gt; To: oro-user@jakarta.apache.org
&gt; Subject: to match there is no such substring
&gt;
&gt; hi,
&gt;
&gt; With RE is possible to match a string. I want but the opposite. it should
&gt; return true, if the regex doesn't match a string.
&gt;
&gt; Problem is: i have a strings and i want only the strings, which don't
&gt; contain a specific word.
&gt;
&gt; thanx in advance
&gt;
&gt;
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;



CONFIDENTIALITY NOTICE: This electronic mail transmission and any accompanying documents contain
information belonging to Silver Creek Systems, Inc. that may be confidential and legally privileged.
If you are not the intended recipient, any disclosure, copying, distribution or action taken
in reliance on the information is strictly prohibited. If you have received the information
in error, please contact the sender by reply email and destroy all copies of the original
email. Thank You.




</pre>
</div>
</content>
</entry>
<entry>
<title>Re: to match there is no such substring</title>
<author><name>&quot;Ilhami Visne&quot; &lt;ilhami.visne@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3cce6b4d120606140923p12ad8ad7vee28b5546c13349c@mail.gmail.com%3e"/>
<id>urn:uuid:%3cce6b4d120606140923p12ad8ad7vee28b5546c13349c@mail-gmail-com%3e</id>
<updated>2006-06-14T16:23:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
actually i wrote a program, where there is a text-field for regex. users
write their regexes into this field. now, how a regex should they write to
do this?

regards

On 6/13/06, Chennamsetti, Raja &lt;Raja.Chennamsetti@marriott.com&gt; wrote:
&gt;
&gt; You can use !Matcher.contains(pattern)
&gt; regards
&gt;
&gt;
&gt; -----Original Message-----
&gt; From: Ilhami Visne [mailto:ilhami.visne@gmail.com]
&gt; Sent: Tue 6/13/2006 5:19 AM
&gt; To: oro-user@jakarta.apache.org
&gt; Subject: to match there is no such substring
&gt;
&gt; hi,
&gt;
&gt; With RE is possible to match a string. I want but the opposite. it should
&gt; return true, if the regex doesn't match a string.
&gt;
&gt; Problem is: i have a strings and i want only the strings, which don't
&gt; contain a specific word.
&gt;
&gt; thanx in advance
&gt;
&gt;
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>RE: to match there is no such substring</title>
<author><name>&quot;Chennamsetti, Raja&quot; &lt;Raja.Chennamsetti@marriott.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3cC044B8131BB129439B1217F857D97A52059F989C@hdqncexmbx2.mihdq.marrcorp.marriott.com%3e"/>
<id>urn:uuid:%3cC044B8131BB129439B1217F857D97A52059F989C@hdqncexmbx2-mihdq-marrcorp-marriott-com%3e</id>
<updated>2006-06-13T14:07:00Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
You can use !Matcher.contains(pattern)
regards


-----Original Message-----
From: Ilhami Visne [mailto:ilhami.visne@gmail.com]
Sent: Tue 6/13/2006 5:19 AM
To: oro-user@jakarta.apache.org
Subject: to match there is no such substring
 
hi,

With RE is possible to match a string. I want but the opposite. it should
return true, if the regex doesn't match a string.

Problem is: i have a strings and i want only the strings, which don't
contain a specific word.

thanx in advance




</pre>
</div>
</content>
</entry>
<entry>
<title>to match there is no such substring</title>
<author><name>&quot;Ilhami Visne&quot; &lt;ilhami.visne@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3cce6b4d120606130219w31de4309y8dcbf718b0a51124@mail.gmail.com%3e"/>
<id>urn:uuid:%3cce6b4d120606130219w31de4309y8dcbf718b0a51124@mail-gmail-com%3e</id>
<updated>2006-06-13T09:19:08Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
hi,

With RE is possible to match a string. I want but the opposite. it should
return true, if the regex doesn't match a string.

Problem is: i have a strings and i want only the strings, which don't
contain a specific word.

thanx in advance


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Odd Regex behavior in oro 2.0.8 lib</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3c200606101720.k5AHKjie010328@gandalf.savarese.org%3e"/>
<id>urn:uuid:%3c200606101720-k5AHKjie010328@gandalf-savarese-org%3e</id>
<updated>2006-06-10T17:20:45Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;20060609192418.52531.qmail@web30603.mail.mud.yahoo.com&gt;, CJ Jouhal 
writes:
&gt;Pattern m_forbiddenTagsWithContentPattern =
&gt;s_perlCompiler.compile(
&gt;				
&gt;"&lt;(script|object|applet|style|noscript)[^&gt;]*&gt;[\\s\\S]*?&lt;/\1[^&gt;]*&gt;",
&gt;					Perl5Compiler.CASE_INSENSITIVE_MASK
&gt;						| Perl5Compiler.READ_ONLY_MASK)
&gt;;

A simple typo.  The "\1" needs to be "\\1".

daniel

-#-#-#-#-| Sleep and The Traveller |-#-#-#-#-#-#-#- http://www.savarese.org/
In distant lands, I hear the call of my home.     #     s a v a r e s e
Yet my work is not done.  My journey's just begun.-    software research
 -- http://www.sleepandthetraveller.com/          # http://www.savarese.com/


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Odd Regex behavior in oro 2.0.8 lib</title>
<author><name>CJ Jouhal &lt;cheekycj@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200606.mbox/%3c20060609192418.52531.qmail@web30603.mail.mud.yahoo.com%3e"/>
<id>urn:uuid:%3c20060609192418-52531-qmail@web30603-mail-mud-yahoo-com%3e</id>
<updated>2006-06-09T19:24:18Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,
  I am seeing some odd regex behavior.

Using the demo applet:
http://jakarta.apache.org/oro/demo.html

I try the following pattern:
&lt;(script|object|applet|style|noscript)[^&gt;]*&gt;[\s\S]*?&lt;/\1[^&gt;]*&gt;
or another alternate version of (with single line
flag)
&lt;(script|object|applet|style|noscript)[^&gt;]*&gt;.*?&lt;/\1[^&gt;]*&gt;

With the following test input:
   &lt;td height="35" colspan="2" align="center"
class="style1"&gt;
    
&lt;script type="text/javascript"&gt;
	function spawn(fileName,width,height) {
window.open(fileName,'new','toolbar=0,location=0,directories=0,status=0,menubar=0,scrollbars=0,width='+width+',height='+height+',resizable=0');
}
&lt;/script&gt;
&lt;style type="text/css"&gt;
	.Copyright { font-size: 10px; font-family: Verdana,
Arial; color: #FFF; padding:2px; margin:0px;
vertical-align:1px; line-height:11px; }
	.Copyright A { color: #FFF; }
&lt;/style&gt;
&lt;span class="Copyright"&gt;&amp;copy; 2006 &lt;a
href="http://www.domain.com/" target="_blank"&gt;Vantage
Media Corporation&lt;/a&gt; - &lt;a
href="JavaScript:spawn('http://www.domain.com/privacy.html','770','501');"&gt;Privacy
Statement&lt;/a&gt; - &lt;a
href="JavaScript:spawn('http://www.domain.com/feedback/?data=aHR0cDovL2NvbGxlZ2UudXMuY29tL2NlYy9mdXR1cmVkZWdyZWUvZGVzaWduLnBocA','460','520');"&gt;Send
Us Feedback&lt;/a&gt;&lt;/span&gt;    &lt;/td&gt;
    &lt;td valign="top"&gt;&amp;nbsp;&lt;/td&gt;
  &lt;/tr&gt;
&lt;/table&gt;
=================================================

And the first pattern matches twice (second pattern
obviously doesn't match in the applet since the applet
doesn't have the single line flag applied)

But the following code:
			
Perl5Compiler s_perlCompiler = new Perl5Compiler();
m_matcher = new Perl5Matcher();
m_matcher.setMultiline(false);

Pattern m_forbiddenTagsWithContentPattern =
s_perlCompiler.compile(
				
"&lt;(script|object|applet|style|noscript)[^&gt;]*&gt;[\\s\\S]*?&lt;/\1[^&gt;]*&gt;",
					Perl5Compiler.CASE_INSENSITIVE_MASK
						| Perl5Compiler.READ_ONLY_MASK);

			// remove content and tags that include
script/applet/object etc
			StringSubstitution substitution1 = new
StringSubstitution(SPACE);
			filteredStr = 
				Util.substitute(m_matcher,
								m_forbiddenTagsWithContentPattern,
								substitution1,
								text,
								Util.SUBSTITUTE_ALL);
// text is set as the above sample text.

The subtitution does nothing.  I even tried:
PatternMatcherInput input = new
PatternMatcherInput(text);
while(m_matcher.contains(input, pattern)) {
			System.out.println("In manual strip method - Found
match btw:" + input.getMatchBeginOffset() + "," +
input.getMatchEndOffset() + ":" +
input.substring(input.getMatchBeginOffset(),
input.getMatchEndOffset()));
}

And the above logs nothing.

I tried compiling the pattern with the
SINGLE_LINE_MASK but that made no difference.

Any ideas/help would be appreciated.

TIA,
CJ


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Need Help</title>
<author><name>Melvin Zamora &lt;mijzcx@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200605.mbox/%3c20060501130325.36251.qmail@web54007.mail.yahoo.com%3e"/>
<id>urn:uuid:%3c20060501130325-36251-qmail@web54007-mail-yahoo-com%3e</id>
<updated>2006-05-01T13:03:25Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
my follow up:

I just like want to add up my target string for more as:

-1,04621601/01/0640.00       
-1,04621701/01/0640.00       
-1,04621901/01/0680.00       
-1,04623401/01/06100.00    
-3,04622601/01/0650.00

I need help to construct a regex expression for excavating this:

40.00 from -1,04621601/01/06
40.00 from -1,04621701/01/06
80.00 from -1,04621901/01/06
100.00 from -1,04623401/01/06
50.00 from -3,04622601/01/06

again thanks in advance.


Melvin Zamora &lt;mijzcx@yahoo.com&gt; wrote: Hi,

I have this problem on how to regex this one.

-1,04621601/01/0640.00       
-1,04621701/01/0640.00       
-1,04621901/01/0680.00       

I need to extract only the 
40.00 from -1,04621601/01/06
40.00 from -1,04621701/01/06
80.00 from -1,04621901/01/06

please help me on my school assignment for regex.
my gratitude included.

melvin r. zamora

  
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2ï¿½/min
or less.

		
---------------------------------
Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.

</pre>
</div>
</content>
</entry>
<entry>
<title>Need Help</title>
<author><name>Melvin Zamora &lt;mijzcx@yahoo.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200605.mbox/%3c20060501124848.30887.qmail@web54007.mail.yahoo.com%3e"/>
<id>urn:uuid:%3c20060501124848-30887-qmail@web54007-mail-yahoo-com%3e</id>
<updated>2006-05-01T12:48:48Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi,

I have this problem on how to regex this one.

-1,04621601/01/0640.00       
-1,04621701/01/0640.00       
-1,04621901/01/0680.00       

I need to extract only the 
40.00 from -1,04621601/01/06
40.00 from -1,04621701/01/06
80.00 from -1,04621901/01/06

please help me on my school assignment for regex.
my gratitude included.

melvin r. zamora

		
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min
or less.

</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Duke Tantiprasut&quot; &lt;duketantiprasut@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200604.mbox/%3c824028f0604271249n5ebcf5e3n37d37effb1c45ff3@mail.gmail.com%3e"/>
<id>urn:uuid:%3c824028f0604271249n5ebcf5e3n37d37effb1c45ff3@mail-gmail-com%3e</id>
<updated>2006-04-27T19:49:22Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Any news on when the new release will be available with the changes?

Thanks.

Duke

On 4/1/06, Daniel F. Savarese &lt;dfs@savarese.org&gt; wrote:
&gt;
&gt;
&gt; In message &lt;824028f0603310939q25e01339kc4a0a75e3c31d66e@mail.gmail.com&gt;,
&gt; "Duke
&gt; Tantiprasut" writes:
&gt; &gt;I think I'm going to stick it out with oro/perl5util. I prefer to provide
&gt; &gt;the flexibility and perl5 familarity than a little extra speed at this
&gt; &gt;stage. Do you know when you'll get the chance to look at the changes to
&gt; mak=
&gt; &gt;e
&gt; &gt;it more multi-thread friendly?
&gt;
&gt; I made the change on the trunk this morning.  You'll have to check it
&gt; out with svn and compile it.  I don't know when we'll be cutting a new
&gt; release.  Everything related to ORO is done based on user demand.  The
&gt; change could always be backported to produce a 2.0.9 release because
&gt; the trunk has changes in it that aren't appropriate for 2.0.9 and
&gt; may still change (the engine wrapper interfaces and the implementation of
&gt; a wrapper for java.util.regex).  However, the trunk is stable (i.e., no
&gt; more bugs than 2.0.8), so it's safe to use as you would 2.0.8 even though
&gt; the new stuff may change.  Just read the CHANGES file for a list of
&gt; additions.
&gt;
&gt; &gt;With Perl5Util, doesnt that generate the patterns that cached and used
&gt; the
&gt; &gt;Perl5Matcher? i.e. am I correct in assuming that the penalty is only
&gt; during
&gt; &gt;the initial pattern generation and not during subsequent matching?
&gt;
&gt; Yes, that is correct.  The patterns are generated only the first time
&gt; they are used (or if they subsequently get kicked out of the cache).
&gt; I don't know how bad of a performance hit the synchronized method calls
&gt; are these days, but it would have helped in 1.0.2, 1.1, and probably 1.2
&gt; to have avoided synchronizing the methods.  But Perl5Util was a
&gt; user-requested class (AOL actually asked for it) and the whole idea at
&gt; the time was that if you wanted performance, you should use a separate
&gt; matcher in separate threads.  In general, my preference is to push thread
&gt; concerns out of libraries and into applications as much as possible, but
&gt; given the nature of Perl5Util, it does seem kind of weird to me now that
&gt; it uses synchronized methods everywhere and doesn't just use a separate
&gt; matcher for each thread.  On the other hand, if it were to do that, then
&gt; it would be better for Perl5Util to be unsynchronized, leaving it to the
&gt; application to create thread-local Perl5Util instances.  But the request
&gt; at the time was to be able to use a single class instance to perform
&gt; matches in multiple threads.  Less RAM back in those days.
&gt;
&gt; daniel
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Duke Tantiprasut&quot; &lt;duketantiprasut@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200604.mbox/%3c824028f0604011532n1ac7a248g5e4cc66d884354cd@mail.gmail.com%3e"/>
<id>urn:uuid:%3c824028f0604011532n1ac7a248g5e4cc66d884354cd@mail-gmail-com%3e</id>
<updated>2006-04-01T23:32:19Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks. I'll have a look monday/tuesday next week and let you know if I run
into any hiccups.

Duke

On 4/1/06, Daniel F. Savarese &lt;dfs@savarese.org&gt; wrote:
&gt;
&gt;
&gt; In message &lt;824028f0603310939q25e01339kc4a0a75e3c31d66e@mail.gmail.com&gt;,
&gt; "Duke
&gt; Tantiprasut" writes:
&gt; &gt;I think I'm going to stick it out with oro/perl5util. I prefer to provide
&gt; &gt;the flexibility and perl5 familarity than a little extra speed at this
&gt; &gt;stage. Do you know when you'll get the chance to look at the changes to
&gt; mak=
&gt; &gt;e
&gt; &gt;it more multi-thread friendly?
&gt;
&gt; I made the change on the trunk this morning.  You'll have to check it
&gt; out with svn and compile it.  I don't know when we'll be cutting a new
&gt; release.  Everything related to ORO is done based on user demand.  The
&gt; change could always be backported to produce a 2.0.9 release because
&gt; the trunk has changes in it that aren't appropriate for 2.0.9 and
&gt; may still change (the engine wrapper interfaces and the implementation of
&gt; a wrapper for java.util.regex).  However, the trunk is stable (i.e., no
&gt; more bugs than 2.0.8), so it's safe to use as you would 2.0.8 even though
&gt; the new stuff may change.  Just read the CHANGES file for a list of
&gt; additions.
&gt;
&gt; &gt;With Perl5Util, doesnt that generate the patterns that cached and used
&gt; the
&gt; &gt;Perl5Matcher? i.e. am I correct in assuming that the penalty is only
&gt; during
&gt; &gt;the initial pattern generation and not during subsequent matching?
&gt;
&gt; Yes, that is correct.  The patterns are generated only the first time
&gt; they are used (or if they subsequently get kicked out of the cache).
&gt; I don't know how bad of a performance hit the synchronized method calls
&gt; are these days, but it would have helped in 1.0.2, 1.1, and probably 1.2
&gt; to have avoided synchronizing the methods.  But Perl5Util was a
&gt; user-requested class (AOL actually asked for it) and the whole idea at
&gt; the time was that if you wanted performance, you should use a separate
&gt; matcher in separate threads.  In general, my preference is to push thread
&gt; concerns out of libraries and into applications as much as possible, but
&gt; given the nature of Perl5Util, it does seem kind of weird to me now that
&gt; it uses synchronized methods everywhere and doesn't just use a separate
&gt; matcher for each thread.  On the other hand, if it were to do that, then
&gt; it would be better for Perl5Util to be unsynchronized, leaving it to the
&gt; application to create thread-local Perl5Util instances.  But the request
&gt; at the time was to be able to use a single class instance to perform
&gt; matches in multiple threads.  Less RAM back in those days.
&gt;
&gt; daniel
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200604.mbox/%3c200604011747.k31HlohS028183@gandalf.savarese.org%3e"/>
<id>urn:uuid:%3c200604011747-k31HlohS028183@gandalf-savarese-org%3e</id>
<updated>2006-04-01T17:47:50Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;824028f0603310939q25e01339kc4a0a75e3c31d66e@mail.gmail.com&gt;, "Duke 
Tantiprasut" writes:
&gt;I think I'm going to stick it out with oro/perl5util. I prefer to provide
&gt;the flexibility and perl5 familarity than a little extra speed at this
&gt;stage. Do you know when you'll get the chance to look at the changes to mak=
&gt;e
&gt;it more multi-thread friendly?

I made the change on the trunk this morning.  You'll have to check it
out with svn and compile it.  I don't know when we'll be cutting a new
release.  Everything related to ORO is done based on user demand.  The
change could always be backported to produce a 2.0.9 release because
the trunk has changes in it that aren't appropriate for 2.0.9 and 
may still change (the engine wrapper interfaces and the implementation of
a wrapper for java.util.regex).  However, the trunk is stable (i.e., no
more bugs than 2.0.8), so it's safe to use as you would 2.0.8 even though
the new stuff may change.  Just read the CHANGES file for a list of
additions.

&gt;With Perl5Util, doesnt that generate the patterns that cached and used the
&gt;Perl5Matcher? i.e. am I correct in assuming that the penalty is only during
&gt;the initial pattern generation and not during subsequent matching?

Yes, that is correct.  The patterns are generated only the first time
they are used (or if they subsequently get kicked out of the cache).
I don't know how bad of a performance hit the synchronized method calls
are these days, but it would have helped in 1.0.2, 1.1, and probably 1.2
to have avoided synchronizing the methods.  But Perl5Util was a
user-requested class (AOL actually asked for it) and the whole idea at
the time was that if you wanted performance, you should use a separate
matcher in separate threads.  In general, my preference is to push thread
concerns out of libraries and into applications as much as possible, but
given the nature of Perl5Util, it does seem kind of weird to me now that
it uses synchronized methods everywhere and doesn't just use a separate
matcher for each thread.  On the other hand, if it were to do that, then
it would be better for Perl5Util to be unsynchronized, leaving it to the
application to create thread-local Perl5Util instances.  But the request
at the time was to be able to use a single class instance to perform
matches in multiple threads.  Less RAM back in those days.

daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Duke Tantiprasut&quot; &lt;duketantiprasut@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200603.mbox/%3c824028f0603310939q25e01339kc4a0a75e3c31d66e@mail.gmail.com%3e"/>
<id>urn:uuid:%3c824028f0603310939q25e01339kc4a0a75e3c31d66e@mail-gmail-com%3e</id>
<updated>2006-03-31T17:39:32Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Hi Daniel,

I think I'm going to stick it out with oro/perl5util. I prefer to provide
the flexibility and perl5 familarity than a little extra speed at this
stage. Do you know when you'll get the chance to look at the changes to make
it more multi-thread friendly?

With Perl5Util, doesnt that generate the patterns that cached and used the
Perl5Matcher? i.e. am I correct in assuming that the penalty is only during
the initial pattern generation and not during subsequent matching?

Thanks

Duke

On 3/30/06, Duke Tantiprasut &lt;duketantiprasut@gmail.com&gt; wrote:
&gt;
&gt; Thanks Daniel.
&gt;
&gt; Sounds like I should be moving to java.util.regex. I do like the
&gt; convenience of the pattern caching but I guess it's easy enough to set that
&gt; up myself for java.util.regex.
&gt;
&gt; Duke
&gt;
&gt;
&gt; On 3/29/06, Daniel F. Savarese &lt;dfs@savarese.org&gt; wrote:
&gt; &gt;
&gt; &gt;
&gt; &gt; In message &lt;824028f0603291030j5607dfd9g2e8208ff51f60320@mail.gmail.com&gt;,
&gt; &gt; "Duke
&gt; &gt; Tantiprasut" writes:
&gt; &gt; &gt;I'm curious why there is such a significant jump from the Perl5Matcher
&gt; &gt; &gt;compared to the java.util.regex?
&gt; &gt;
&gt; &gt; A hefty chunk of that time comes from converting strings to char[]
&gt; &gt; before
&gt; &gt; matching.  I've tuned that benchmark before and trimmed 25% of the time
&gt; &gt; just by using PatternMatcherInput instead of String.  It's not exactly
&gt; &gt; a rigorous benchmark anyway.  Measurements I've made in the past show
&gt; &gt; that the performance of the packages depends heavily on the input and
&gt; &gt; how the regular expressions are written.  Two equivalent regular
&gt; &gt; expressions can have very different performance characteristics.
&gt; &gt; That said, ORO is behind the times on performance, having been designed
&gt; &gt; originally to get the most out of JDK 1.0.2.
&gt; &gt;
&gt; &gt; A question that bears revisiting is if Perl5Matcher needs to bother
&gt; &gt; converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
&gt; &gt; performance win, but unless you're working with your input as
&gt; &gt; char[] from the start, I bet these days it would be faster to not make
&gt; &gt; the conversion and work directly with String (or CharSequence) if we're
&gt; &gt; willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
&gt; &gt; a java.util.regex, the primary reason to use ORO appears to be if you're
&gt; &gt; still on 1.2/1.3...
&gt; &gt;
&gt; &gt; In response to the email Subject, Perl5Util is a convenience class and
&gt; &gt; will always be slower than using Perl5Matcher directly because Perl5Util
&gt; &gt; has to parse the native Perl-style representation of expressions :(
&gt; &gt;
&gt; &gt; daniel
&gt; &gt;
&gt; &gt;
&gt; &gt; ---------------------------------------------------------------------
&gt; &gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; &gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt; &gt;
&gt; &gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Duke Tantiprasut&quot; &lt;duketantiprasut@gmail.com&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200603.mbox/%3c824028f0603300909s3a95b4ach743b98655946b736@mail.gmail.com%3e"/>
<id>urn:uuid:%3c824028f0603300909s3a95b4ach743b98655946b736@mail-gmail-com%3e</id>
<updated>2006-03-30T17:09:44Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>
Thanks Daniel.

Sounds like I should be moving to java.util.regex. I do like the convenience
of the pattern caching but I guess it's easy enough to set that up myself
for java.util.regex.

Duke

On 3/29/06, Daniel F. Savarese &lt;dfs@savarese.org&gt; wrote:
&gt;
&gt;
&gt; In message &lt;824028f0603291030j5607dfd9g2e8208ff51f60320@mail.gmail.com&gt;,
&gt; "Duke
&gt; Tantiprasut" writes:
&gt; &gt;I'm curious why there is such a significant jump from the Perl5Matcher
&gt; &gt;compared to the java.util.regex?
&gt;
&gt; A hefty chunk of that time comes from converting strings to char[] before
&gt; matching.  I've tuned that benchmark before and trimmed 25% of the time
&gt; just by using PatternMatcherInput instead of String.  It's not exactly
&gt; a rigorous benchmark anyway.  Measurements I've made in the past show
&gt; that the performance of the packages depends heavily on the input and
&gt; how the regular expressions are written.  Two equivalent regular
&gt; expressions can have very different performance characteristics.
&gt; That said, ORO is behind the times on performance, having been designed
&gt; originally to get the most out of JDK 1.0.2.
&gt;
&gt; A question that bears revisiting is if Perl5Matcher needs to bother
&gt; converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
&gt; performance win, but unless you're working with your input as
&gt; char[] from the start, I bet these days it would be faster to not make
&gt; the conversion and work directly with String (or CharSequence) if we're
&gt; willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
&gt; a java.util.regex, the primary reason to use ORO appears to be if you're
&gt; still on 1.2/1.3...
&gt;
&gt; In response to the email Subject, Perl5Util is a convenience class and
&gt; will always be slower than using Perl5Matcher directly because Perl5Util
&gt; has to parse the native Perl-style representation of expressions :(
&gt;
&gt; daniel
&gt;
&gt;
&gt; ---------------------------------------------------------------------
&gt; To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
&gt; For additional commands, e-mail: oro-user-help@jakarta.apache.org
&gt;
&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>Re: Perl5Util performance</title>
<author><name>&quot;Daniel F. Savarese&quot; &lt;dfs@savarese.org&gt;</name></author>
<link rel="alternate" href="http://mail-archives.apache.org/mod_mbox/jakarta-oro-user/200603.mbox/%3c200603300613.k2U6DfKx017716@gandalf.savarese.org%3e"/>
<id>urn:uuid:%3c200603300613-k2U6DfKx017716@gandalf-savarese-org%3e</id>
<updated>2006-03-30T06:13:40Z</updated>
<content type="xhtml">
<div xmlns="http://www.w3.org/1999/xhtml">
<pre>

In message &lt;824028f0603291030j5607dfd9g2e8208ff51f60320@mail.gmail.com&gt;, "Duke 
Tantiprasut" writes:
&gt;I'm curious why there is such a significant jump from the Perl5Matcher
&gt;compared to the java.util.regex?

A hefty chunk of that time comes from converting strings to char[] before
matching.  I've tuned that benchmark before and trimmed 25% of the time
just by using PatternMatcherInput instead of String.  It's not exactly
a rigorous benchmark anyway.  Measurements I've made in the past show
that the performance of the packages depends heavily on the input and
how the regular expressions are written.  Two equivalent regular
expressions can have very different performance characteristics.
That said, ORO is behind the times on performance, having been designed
originally to get the most out of JDK 1.0.2.

A question that bears revisiting is if Perl5Matcher needs to bother
converting to char[] anymore.  In JDK 1.0.2 and 1.1 days it was a big
performance win, but unless you're working with your input as
char[] from the start, I bet these days it would be faster to not make
the conversion and work directly with String (or CharSequence) if we're
willing to abandon JDK 1.2/1.3 compatibility.  But now that there's
a java.util.regex, the primary reason to use ORO appears to be if you're
still on 1.2/1.3...

In response to the email Subject, Perl5Util is a convenience class and
will always be slower than using Perl5Matcher directly because Perl5Util
has to parse the native Perl-style representation of expressions :(

daniel


---------------------------------------------------------------------
To unsubscribe, e-mail: oro-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: oro-user-help@jakarta.apache.org



</pre>
</div>
</content>
</entry>
</feed>
