jakarta-regexp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 35834] - file size limitation when specify RE.MATCH_SINGLELINE | RE.MATCH_CASEINDEPENDENT on RE constructor.
Date Tue, 16 Aug 2005 16:16:06 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=35834>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=35834





------- Additional Comments From nancy.l.farnsworth@aphis.usda.gov  2005-08-16 18:15 -------
die  
The program simply stops executing.
I do not receive any errors.  I do not
see any exceptions in the log.

comments:
The code executes successfully when I do
not set the flag to RE.MATCH_SINGLELINE.
Unfortunately, but as would be expected,
I do not get a match when the
content is continued onto the next line.
However, when I set the flag to RE.MATCH_
SINGLELINE, the program simply stops
executing partially through the file.  If the file
is short, it completes successfully.  However,
if the file is longer, execution stops.  I receive
no errors or thrown exceptions.

notes:
I have since changed the pattern to read as follows:
(<A[^>]*>)|(<APPLET[^>]*>)|(<AREA[^>]*>)|       etc
It seems to work.  

thoughts:
Even if the old pattern and code are stupid, it
seems I should still get some type of error.
I would think that there would be some type
of exception that I could trap or at least see
in the log.

problem pattern:
(<A(.)*>)|(<APPLET(.)*>)|(<AREA(.)*>)|       etc

code:
//Search html file for pattern.			
try
	{
	//Construct an RE object
	int flags = RE.MATCH_CASEINDEPENDENT|RE.MATCH_SINGLELINE; 		
	
	RE re = new RE(strPattern,flags);

	//Use the object to match to the input.
	Reader r = new FileReader(strFilePathAndName);
	CharacterIterator in = new ReaderCharacterIterator(r);
	int end=0;

	while(re.match(in,end))
		{
		//Reset starting point in input file
		end = re.getParenEnd(0);
				
		//Retrieve Tag
		String strFoundTag = re.getParen(0);
		logger.debug("Found Tag:"+strFoundTag);

		//Process tag appropriately.
		//Retrieve urls from tag and add to array of urls.
		Iterator iterator = alResourceElementsList.iterator();
		while (iterator.hasNext())
			{
			//Search for each possible element of the tag.
			String strElement = (String)iterator.next();
			int iBeginUrl = strFoundTag.trim().toUpperCase().indexOf
(strElement);
			//If an element is found, retrieve the url from the 
element.
			char cEndChar = '"';
			if (iBeginUrl >= 0)  //Element found
				{
				int iEndUrl = 					
	                      strFoundTag.trim().toUpperCase().indexOf
(cEndChar,iBeginUrl+strElement.length()+2);
				if( ! ((iBeginUrl+2) <= iEndUrl) )
					{
					logger.error("Cannot retrieve url from 
element in scanHtmlFileForResourceTags.");
					logger.error
("FilePathAndName: "+strFilePathAndName);
					logger.error("Tag: "+ strFoundTag);
					logger.error("Element: "+ strElement);
					return 1;
					}					
				String strTempUrl = strFoundTag.substring
(iBeginUrl+strElement.length()+2,iEndUrl);

				//TO DO: Test for CODEBASE/CODE for APPLET!	
					
				String strUrl;
				if (strElement.trim().equalsIgnoreCase
("CODEBASE"))
					{
					//Do not code until determined that 
this code is necessary.
					strUrl = strTempUrl;
					logger.error("APPLET tag contains 
element CODEBASE.");
					logger.error("Program does not contain 
code to process CODEBASE.");
					logger.error("Base url to resolve 
relative url is current directory of html file.");
					logger.error("The corresponding 
database entry is incorrect.");

					}//EndProcessCodeBase
				else
					{
					strUrl = strTempUrl;
					}//EndProcessAllOtherTags
							
				logger.debug("Url:"+strUrl);
						
				//Save each url that does not start with "#"
				//(Tags A and FRAME can start w/# - see doc for 
details)
				if (strUrl != null)
					{
					if (strUrl.startsWith("#"))	
						break;
					}//EndUrlNotNull

				//Save url.
				alUrl.add(strUrl);
				//Save tag name only, not entire tag.
				String strSpace = " ";
				String strTag = strFoundTag.substring
(1,strFoundTag.indexOf(strSpace));
				alLinkTagType.add(strTag);

				}///EndElementFound
					
			}//EndIterateThroughElements

				}//EndWhileMatchesInHtmlFile
				
				
			}//EndTryFindMatchesInHtml
		catch(RESyntaxException e)
			{
			logger.error("Regular Expression syntax expression.");	
			logger.error("File Path and Name:"+strFilePathAndName);
			return 1;
			}		 
		catch(FileNotFoundException e)
			{
			logger.error("FileNotFoundException on 
scanHtmlFileForResourceTags");	
			logger.error("File Not Found:"+strFilePathAndName);
			return 1;
			}
		logger.info("End of Routine");


input file:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 
<html>
<head>
<title>Index of Pages in AWStats Web Site</title>
<!--- link rel="STYLESHEET" type="text/css" href="../styles/w3c_oldstyle.css"---
>
<link rel="STYLESHEET" type="text/css" href="../styles/default.css">
 
</head>
 
<body>
<div id="pageContent">
<h1>Documentation for AWStats Pilot Deployment in APHIS</h1>
<h2>Current Notices</h2>
<p>This weekend, 1-3 July, testing will be preformed to determine is AWStats 
can process logs
   for periods within a month which were missed in a previous run. The 
application
   will also be tested to determine if logs for a previous month can be run.  In
   other words, can logs for March be processed if the April logs have already 
be run.</p>
<h2>Web Server Reports</h2>
<ul class="menuLinks" title="Dynamic WWW Web Server Reports for Current Month">
  <ul class="menuLinkItem">
    <li class="source"><a href="/awstats/awstats.pl?config=www.aphis.usda.gov"
     target="_blank">AWStats Report for Web Server WWW for Current Month</a> 
(Opens new window)</li>
    <li class="desc">This link connects the viewer to a collection of AWStats 
reports
    for the APHIS Internet web server (www.aphis.usda.gov) for the current 
month.  As
    these report are created dynamically, be aware that the report can take up 
to 10
    seconds to appear at high-demand times.</li>
    <li class="format">(html)</li>
  </ul>
  <ul class="menuLinks" title="List of Static WWW Web Server Reports for 
Current Month">
    <ul class="menuLinkItem">
      <li class="source"><a href="./AWStats_report_index.html">List of AWStats

Reports for Web Server WWW for Current Month<
/a></li>
      <li class="desc">This link connects the viewer to a list of static 
AWStats reports
        for the APHIS Internet web server (www.aphis.usda.gov) for the current 
month.  These reports are
        regenerated every day at 3:00 AM.</li>
      <li class="format">(html)</li>
    </ul>
  </ul>
</ul>
 
<h2>AWStats Internal On-Line Resources</h2>
<ul class="menuLinks" title="AWStats Internal On-Line Resources">
  <ul class="menuLinkItem">
    <li class="source"><a 
href="/pages/how_to_run_web_analytic_reports_using_AWStats.html">
    How to Run Web Analytic Reports Using AWStats</a></li>
    <li class="desc">This document briefly describes how to run standard
          reports using the AWStats web server log analysis tool as a CGI 
application
          from a browser</li>
    <li class="format">(html)</li>
  </ul>
  <ul class="menuLinkItem">
    <li class="source"><a href="/pages/example_AWStats_reports.html">
    Examples of Creating On-Line reports with AWStats</a></li>
    <li class="desc">This document provides several examples of running AWstats 
reports
      as a CGI script using a browser.</li>
    <li class="format">(html)</li>
  </ul>
  <ul class="menuLinkItem">
    <li class="source"><a href="/docs/AWStats_pilot_options.doc">
    AWStats Options for Pilot Deployment</a></li>
    <li class="desc">This document provides a brief overview of the
    business case for deploying AWStats web analytics application in
    a pilot mode. Also details configuration for AWStats during
    pilot mode.</li>
    <li class="format">(MS-Word)</li>
  </ul>
  <ul class="menuLinkItem">
    <li class="source"><a href="/pages/faq.html">
    Frequently Asked Questions</a></li>
    <li class="desc">This document provides answers to questions
    frequently asked by AWStats users.</li>
    <li class="format">(html)</li>
 </ul>
  <ul class="menuLinkItem">
    <li class="source"><a href="/pages/todo.html">
    AWStats Web Site To-Do List</a></li>
    <li class="desc">This document is a list of items that need to be 
accomplished
    to support the AWStats pilot deployment.</li>
    <li class="format">(html)</li>
  </ul>
</ul>
<h2>AWStats External On-Line Resources</h2>
<ul class="menuLinks" title="AWStats External On-Line Resources">
  <ul class="menuLinkItem">
    <li class="source"><a href="http://awstats.sourceforge.net/index.html">
    AWStats Project Page</a></li>
    <li class="desc">Main Sourceforge project web site for AWStats, which bills
    itself as a free powerful and featureful tool that generates advanced web,
    streaming, ftp or mail server statistics, graphically.</li>
    <li class="format">(html)</li>
  </ul>
  <ul class="menuLinkItem">
    <li class="source"><a href="http://sourceforge.net/forum/forum.php?
forum_id=43428">
    AWStats Forum (General)</a></li>
    <li class="desc">AWStats forum for general users hosted by Sourceforge.</li>
    <li class="format">(php)</li>
  </ul>
  <ul class="menuLinkItem">
    <li class="source"><a 
href="http://awstats.sourceforge.net/docs/awstats.pdf">
    AWStats Documentation</a></li>
    <li class="desc">User documentation for AWStats.
    <li class="format">(pdf)</li>
  </ul>
</ul>
</div>
</body>
</html>






-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-dev-help@jakarta.apache.org


Mime
View raw message