httpd-cvs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pgollu...@apache.org
Subject svn commit: r598339 [14/37] - in /httpd/httpd/vendor/pcre/current: ./ doc/ doc/html/ testdata/
Date Mon, 26 Nov 2007 16:50:09 GMT
Modified: httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcreposix.html Mon Nov 26 08:49:53 2007
@@ -21,6 +21,7 @@
 <li><a name="TOC6" href="#SEC6">ERROR MESSAGES</a>
 <li><a name="TOC7" href="#SEC7">MEMORY USAGE</a>
 <li><a name="TOC8" href="#SEC8">AUTHOR</a>
+<li><a name="TOC9" href="#SEC9">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS OF POSIX API</a><br>
 <P>
@@ -46,8 +47,8 @@
 This set of functions provides a POSIX-style API to the PCRE regular expression
 package. See the
 <a href="pcreapi.html"><b>pcreapi</b></a>
-documentation for a description of PCRE's native API, which contains additional
-functionality.
+documentation for a description of PCRE's native API, which contains much
+additional functionality.
 </P>
 <P>
 The functions described here are just wrapper functions that ultimately call
@@ -59,10 +60,10 @@
 </P>
 <P>
 I have implemented only those option bits that can be reasonably mapped to PCRE
-native options. In addition, the options REG_EXTENDED and REG_NOSUB are defined
-with the value zero. They have no effect, but since programs that are written
-to the POSIX interface often use them, this makes it easier to slot in PCRE as
-a replacement library. Other POSIX options are not even defined.
+native options. In addition, the option REG_EXTENDED is defined with the value
+zero. This has no effect, but since programs that are written to the POSIX
+interface often use it, this makes it easier to slot in PCRE as a replacement
+library. Other POSIX options are not even defined.
 </P>
 <P>
 When PCRE is called via these functions, it is only the API that is POSIX-like
@@ -89,22 +90,43 @@
 internal form. The pattern is a C string terminated by a binary zero, and
 is passed in the argument <i>pattern</i>. The <i>preg</i> argument is a pointer
 to a <b>regex_t</b> structure that is used as a base for storing information
-about the compiled expression.
+about the compiled regular expression.
 </P>
 <P>
 The argument <i>cflags</i> is either zero, or contains one or more of the bits
 defined by the following macros:
 <pre>
+  REG_DOTALL
+</pre>
+The PCRE_DOTALL option is set when the regular expression is passed for
+compilation to the native function. Note that REG_DOTALL is not part of the
+POSIX standard.
+<pre>
   REG_ICASE
 </pre>
-The PCRE_CASELESS option is set when the expression is passed for compilation
-to the native function.
+The PCRE_CASELESS option is set when the regular expression is passed for
+compilation to the native function.
 <pre>
   REG_NEWLINE
 </pre>
-The PCRE_MULTILINE option is set when the expression is passed for compilation
-to the native function. Note that this does <i>not</i> mimic the defined POSIX
-behaviour for REG_NEWLINE (see the following section).
+The PCRE_MULTILINE option is set when the regular expression is passed for
+compilation to the native function. Note that this does <i>not</i> mimic the
+defined POSIX behaviour for REG_NEWLINE (see the following section).
+<pre>
+  REG_NOSUB
+</pre>
+The PCRE_NO_AUTO_CAPTURE option is set when the regular expression is passed
+for compilation to the native function. In addition, when a pattern that is
+compiled with this flag is passed to <b>regexec()</b> for matching, the
+<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
+are returned.
+<pre>
+  REG_UTF8
+</pre>
+The PCRE_UTF8 option is set when the regular expression is passed for
+compilation to the native function. This causes the pattern itself and all data
+strings used for matching it to be treated as UTF-8 strings. Note that REG_UTF8
+is not part of the POSIX standard.
 </P>
 <P>
 In the absence of these flags, no options are passed to the native function.
@@ -172,15 +194,20 @@
 function.
 </P>
 <P>
-The portion of the string that was matched, and also any captured substrings,
-are returned via the <i>pmatch</i> argument, which points to an array of
-<i>nmatch</i> structures of type <i>regmatch_t</i>, containing the members
-<i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first character of
-each substring and the offset to the first character after the end of each
-substring, respectively. The 0th element of the vector relates to the entire
-portion of <i>string</i> that was matched; subsequent elements relate to the
-capturing subpatterns of the regular expression. Unused entries in the array
-have both structure members set to -1.
+If the pattern was compiled with the REG_NOSUB flag, no data about any matched
+strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
+<b>regexec()</b> are ignored.
+</P>
+<P>
+Otherwise,the portion of the string that was matched, and also any captured
+substrings, are returned via the <i>pmatch</i> argument, which points to an
+array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
+members <i>rm_so</i> and <i>rm_eo</i>. These contain the offset to the first
+character of each substring and the offset to the first character after the end
+of each substring, respectively. The 0th element of the vector relates to the
+entire portion of <i>string</i> that was matched; subsequent elements relate to
+the capturing subpatterns of the regular expression. Unused entries in the
+array have both structure members set to -1.
 </P>
 <P>
 A successful match yields a zero return; various error codes are defined in the
@@ -203,16 +230,19 @@
 </P>
 <br><a name="SEC8" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel &#60;ph10@cam.ac.uk&#62;
+Philip Hazel
+<br>
+University Computing Service
 <br>
-University Computing Service,
+Cambridge CB2 3QH, England.
 <br>
-Cambridge CB2 3QG, England.
 </P>
+<br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 07 September 2004
+Last updated: 06 March 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
 <br>
-Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcreprecompile.html Mon Nov 26 08:49:53 2007
@@ -17,6 +17,8 @@
 <li><a name="TOC2" href="#SEC2">SAVING A COMPILED PATTERN</a>
 <li><a name="TOC3" href="#SEC3">RE-USING A PRECOMPILED PATTERN</a>
 <li><a name="TOC4" href="#SEC4">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE PATTERNS</a><br>
 <P>
@@ -32,7 +34,9 @@
 If you save compiled patterns to a file, you can copy them to a different host
 and run them there. This works even if the new host has the opposite endianness
 to the one on which the patterns were compiled. There may be a small
-performance penalty, but it should be insignificant.
+performance penalty, but it should be insignificant. However, compiling regular
+expressions with one version of PCRE for use with a different version is not
+guaranteed to work and may cause crashes.
 </P>
 <br><a name="SEC2" href="#TOC1">SAVING A COMPILED PATTERN</a><br>
 <P>
@@ -88,16 +92,17 @@
 <br><a name="SEC3" href="#TOC1">RE-USING A PRECOMPILED PATTERN</a><br>
 <P>
 Re-using a precompiled pattern is straightforward. Having reloaded it into main
-memory, you pass its pointer to <b>pcre_exec()</b> in the usual way. This should
-work even on another host, and even if that host has the opposite endianness to
-the one where the pattern was compiled.
+memory, you pass its pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in
+the usual way. This should work even on another host, and even if that host has
+the opposite endianness to the one where the pattern was compiled.
 </P>
 <P>
 However, if you passed a pointer to custom character tables when the pattern
 was compiled (the <i>tableptr</i> argument of <b>pcre_compile()</b>), you must
-now pass a similar pointer to <b>pcre_exec()</b>, because the value saved with
-the compiled pattern will obviously be nonsense. A field in a
-<b>pcre_extra()</b> block is used to pass this data, as described in the
+now pass a similar pointer to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>,
+because the value saved with the compiled pattern will obviously be nonsense. A
+field in a <b>pcre_extra()</b> block is used to pass this data, as described in
+the
 <a href="pcreapi.html#extradata">section on matching a pattern</a>
 in the
 <a href="pcreapi.html"><b>pcreapi</b></a>
@@ -114,20 +119,30 @@
 <b>pcre_extra</b> data block and set the <i>study_data</i> field to point to the
 reloaded study data. You must also set the PCRE_EXTRA_STUDY_DATA bit in the
 <i>flags</i> field to indicate that study data is present. Then pass the
-<b>pcre_extra</b> block to <b>pcre_exec()</b> in the usual way.
+<b>pcre_extra</b> block to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> in the
+usual way.
 </P>
 <br><a name="SEC4" href="#TOC1">COMPATIBILITY WITH DIFFERENT PCRE RELEASES</a><br>
 <P>
-The layout of the control block that is at the start of the data that makes up
-a compiled pattern was changed for release 5.0. If you have any saved patterns
-that were compiled with previous releases (not a facility that was previously
-advertised), you will have to recompile them for release 5.0. However, from now
-on, it should be possible to make changes in a compabible manner.
+In general, it is safest to recompile all saved patterns when you update to a
+new PCRE release, though not all updates actually require this. Recompiling is
+definitely needed for release 7.2.
 </P>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
 <P>
-Last updated: 10 September 2004
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 13 June 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
 <br>
-Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcresample.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcresample.html?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcresample.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcresample.html Mon Nov 26 08:49:53 2007
@@ -33,9 +33,10 @@
 an empty string. Comments in the code explain what is going on.
 </P>
 <P>
-If PCRE is installed in the standard include and library directories for your
-system, you should be able to compile the demonstration program using this
-command:
+The demonstration program is automatically built if you use "./configure;make"
+to build PCRE. Otherwise, if PCRE is installed in the standard include and
+library directories for your system, you should be able to compile the
+demonstration program using this command:
 <pre>
   gcc -o pcredemo pcredemo.c -lpcre
 </pre>
@@ -72,10 +73,25 @@
 </pre>
 (for example) to the compile command to get round this problem.
 </P>
+<br><b>
+AUTHOR
+</b><br>
 <P>
-Last updated: 09 September 2004
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 13 June 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
 <br>
-Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Added: httpd/httpd/vendor/pcre/current/doc/html/pcrestack.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcrestack.html?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcrestack.html (added)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcrestack.html Mon Nov 26 08:49:53 2007
@@ -0,0 +1,154 @@
+<html>
+<head>
+<title>pcrestack specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcrestack man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<br><b>
+PCRE DISCUSSION OF STACK USAGE
+</b><br>
+<P>
+When you call <b>pcre_exec()</b>, it makes use of an internal function called
+<b>match()</b>. This calls itself recursively at branch points in the pattern,
+in order to remember the state of the match so that it can back up and try a
+different alternative if the first one fails. As matching proceeds deeper and
+deeper into the tree of possibilities, the recursion depth increases.
+</P>
+<P>
+Not all calls of <b>match()</b> increase the recursion depth; for an item such
+as a* it may be called several times at the same level, after matching
+different numbers of a's. Furthermore, in a number of cases where the result of
+the recursive call would immediately be passed back as the result of the
+current call (a "tail recursion"), the function is just restarted instead.
+</P>
+<P>
+The <b>pcre_dfa_exec()</b> function operates in an entirely different way, and
+hardly uses recursion at all. The limit on its complexity is the amount of
+workspace it is given. The comments that follow do NOT apply to
+<b>pcre_dfa_exec()</b>; they are relevant only for <b>pcre_exec()</b>.
+</P>
+<P>
+You can set limits on the number of times that <b>match()</b> is called, both in
+total and recursively. If the limit is exceeded, an error occurs. For details,
+see the
+<a href="pcreapi.html#extradata">section on extra data for <b>pcre_exec()</b></a>
+in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+Each time that <b>match()</b> is actually called recursively, it uses memory
+from the process stack. For certain kinds of pattern and data, very large
+amounts of stack may be needed, despite the recognition of "tail recursion".
+You can often reduce the amount of recursion, and therefore the amount of stack
+used, by modifying the pattern that is being matched. Consider, for example,
+this pattern:
+<pre>
+  ([^&#60;]|&#60;(?!inet))+
+</pre>
+It matches from wherever it starts until it encounters "&#60;inet" or the end of
+the data, and is the kind of pattern that might be used when processing an XML
+file. Each iteration of the outer parentheses matches either one character that
+is not "&#60;" or a "&#60;" that is not followed by "inet". However, each time a
+parenthesis is processed, a recursion occurs, so this formulation uses a stack
+frame for each matched character. For a long string, a lot of stack is
+required. Consider now this rewritten pattern, which matches exactly the same
+strings:
+<pre>
+  ([^&#60;]++|&#60;(?!inet))+
+</pre>
+This uses very much less stack, because runs of characters that do not contain
+"&#60;" are "swallowed" in one item inside the parentheses. Recursion happens only
+when a "&#60;" character that is not followed by "inet" is encountered (and we
+assume this is relatively rare). A possessive quantifier is used to stop any
+backtracking into the runs of non-"&#60;" characters, but that is not related to
+stack usage.
+</P>
+<P>
+This example shows that one way of avoiding stack problems when matching long
+subject strings is to write repeated parenthesized subpatterns to match more
+than one character whenever possible.
+</P>
+<P>
+In environments where stack memory is constrained, you might want to compile
+PCRE to use heap memory instead of stack for remembering back-up points. This
+makes it run a lot more slowly, however. Details of how to do this are given in
+the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+documentation. When built in this way, instead of using the stack, PCRE obtains
+and frees memory by calling the functions that are pointed to by the
+<b>pcre_stack_malloc</b> and <b>pcre_stack_free</b> variables. By default, these
+point to <b>malloc()</b> and <b>free()</b>, but you can replace the pointers to
+cause PCRE to use your own functions. Since the block sizes are always the
+same, and are always freed in reverse order, it may be possible to implement
+customized memory handlers that are more efficient than the standard functions.
+</P>
+<P>
+In Unix-like environments, there is not often a problem with the stack unless
+very long strings are involved, though the default limit on stack size varies
+from system to system. Values from 8Mb to 64Mb are common. You can find your
+default limit by running the command:
+<pre>
+  ulimit -s
+</pre>
+Unfortunately, the effect of running out of stack is often SIGSEGV, though
+sometimes a more explicit error message is given. You can normally increase the
+limit on stack size by code such as this:
+<pre>
+  struct rlimit rlim;
+  getrlimit(RLIMIT_STACK, &rlim);
+  rlim.rlim_cur = 100*1024*1024;
+  setrlimit(RLIMIT_STACK, &rlim);
+</pre>
+This reads the current limits (soft and hard) using <b>getrlimit()</b>, then
+attempts to increase the soft limit to 100Mb using <b>setrlimit()</b>. You must
+do this before calling <b>pcre_exec()</b>.
+</P>
+<P>
+PCRE has an internal counter that can be used to limit the depth of recursion,
+and thus cause <b>pcre_exec()</b> to give an error code before it runs out of
+stack. By default, the limit is very large, and unlikely ever to operate. It
+can be changed when PCRE is built, and it can also be set when
+<b>pcre_exec()</b> is called. For details of these interfaces, see the
+<a href="pcrebuild.html"><b>pcrebuild</b></a>
+and
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<P>
+As a very rough rule of thumb, you should reckon on about 500 bytes per
+recursion. Thus, if you want to limit your stack usage to 8Mb, you
+should set the limit at 16000 recursions. A 64Mb stack, on the other hand, can
+support around 128000 recursions. The <b>pcretest</b> test program has a command
+line option (<b>-S</b>) that can be used to increase the size of its stack.
+</P>
+<br><b>
+AUTHOR
+</b><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><b>
+REVISION
+</b><br>
+<P>
+Last updated: 05 June 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>

Added: httpd/httpd/vendor/pcre/current/doc/html/pcresyntax.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcresyntax.html?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcresyntax.html (added)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcresyntax.html Mon Nov 26 08:49:53 2007
@@ -0,0 +1,449 @@
+<html>
+<head>
+<title>pcresyntax specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcresyntax man page</h1>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>
+<p>
+This page is part of the PCRE HTML documentation. It was generated automatically
+from the original man page. If there is any nonsense in it, please consult the
+man page, in case the conversion went wrong.
+<br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a>
+<li><a name="TOC2" href="#SEC2">QUOTING</a>
+<li><a name="TOC3" href="#SEC3">CHARACTERS</a>
+<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
+<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a>
+<li><a name="TOC6" href="#SEC6">SCRIPT NAMES FOR \p AND \P</a>
+<li><a name="TOC7" href="#SEC7">CHARACTER CLASSES</a>
+<li><a name="TOC8" href="#SEC8">QUANTIFIERS</a>
+<li><a name="TOC9" href="#SEC9">ANCHORS AND SIMPLE ASSERTIONS</a>
+<li><a name="TOC10" href="#SEC10">MATCH POINT RESET</a>
+<li><a name="TOC11" href="#SEC11">ALTERNATION</a>
+<li><a name="TOC12" href="#SEC12">CAPTURING</a>
+<li><a name="TOC13" href="#SEC13">ATOMIC GROUPS</a>
+<li><a name="TOC14" href="#SEC14">COMMENT</a>
+<li><a name="TOC15" href="#SEC15">OPTION SETTING</a>
+<li><a name="TOC16" href="#SEC16">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
+<li><a name="TOC17" href="#SEC17">BACKREFERENCES</a>
+<li><a name="TOC18" href="#SEC18">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
+<li><a name="TOC19" href="#SEC19">CONDITIONAL PATTERNS</a>
+<li><a name="TOC20" href="#SEC20">BACKTRACKING CONTROL</a>
+<li><a name="TOC21" href="#SEC21">NEWLINE CONVENTIONS</a>
+<li><a name="TOC22" href="#SEC22">WHAT \R MATCHES</a>
+<li><a name="TOC23" href="#SEC23">CALLOUTS</a>
+<li><a name="TOC24" href="#SEC24">SEE ALSO</a>
+<li><a name="TOC25" href="#SEC25">AUTHOR</a>
+<li><a name="TOC26" href="#SEC26">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
+<P>
+The full syntax and semantics of the regular expressions that are supported by
+PCRE are described in the
+<a href="pcrepattern.html"><b>pcrepattern</b></a>
+documentation. This document contains just a quick-reference summary of the
+syntax.
+</P>
+<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
+<P>
+<pre>
+  \x         where x is non-alphanumeric is a literal x
+  \Q...\E    treat enclosed characters as literal
+</PRE>
+</P>
+<br><a name="SEC3" href="#TOC1">CHARACTERS</a><br>
+<P>
+<pre>
+  \a         alarm, that is, the BEL character (hex 07)
+  \cx        "control-x", where x is any character
+  \e         escape (hex 1B)
+  \f         formfeed (hex 0C)
+  \n         newline (hex 0A)
+  \r         carriage return (hex 0D)
+  \t         tab (hex 09)
+  \ddd       character with octal code ddd, or backreference
+  \xhh       character with hex code hh
+  \x{hhh..}  character with hex code hhh..
+</PRE>
+</P>
+<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
+<P>
+<pre>
+  .          any character except newline;
+               in dotall mode, any character whatsoever
+  \C         one byte, even in UTF-8 mode (best avoided)
+  \d         a decimal digit
+  \D         a character that is not a decimal digit
+  \h         a horizontal whitespace character
+  \H         a character that is not a horizontal whitespace character
+  \p{<i>xx</i>}     a character with the <i>xx</i> property
+  \P{<i>xx</i>}     a character without the <i>xx</i> property
+  \R         a newline sequence
+  \s         a whitespace character
+  \S         a character that is not a whitespace character
+  \v         a vertical whitespace character
+  \V         a character that is not a vertical whitespace character
+  \w         a "word" character
+  \W         a "non-word" character
+  \X         an extended Unicode sequence
+</pre>
+In PCRE, \d, \D, \s, \S, \w, and \W recognize only ASCII characters.
+</P>
+<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTY CODES FOR \p and \P</a><br>
+<P>
+<pre>
+  C          Other
+  Cc         Control
+  Cf         Format
+  Cn         Unassigned
+  Co         Private use
+  Cs         Surrogate
+
+  L          Letter
+  Ll         Lower case letter
+  Lm         Modifier letter
+  Lo         Other letter
+  Lt         Title case letter
+  Lu         Upper case letter
+  L&         Ll, Lu, or Lt
+
+  M          Mark
+  Mc         Spacing mark
+  Me         Enclosing mark
+  Mn         Non-spacing mark
+
+  N          Number
+  Nd         Decimal number
+  Nl         Letter number
+  No         Other number
+
+  P          Punctuation
+  Pc         Connector punctuation
+  Pd         Dash punctuation
+  Pe         Close punctuation
+  Pf         Final punctuation
+  Pi         Initial punctuation
+  Po         Other punctuation
+  Ps         Open punctuation
+
+  S          Symbol
+  Sc         Currency symbol
+  Sk         Modifier symbol
+  Sm         Mathematical symbol
+  So         Other symbol
+
+  Z          Separator
+  Zl         Line separator
+  Zp         Paragraph separator
+  Zs         Space separator
+</PRE>
+</P>
+<br><a name="SEC6" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
+<P>
+Arabic,
+Armenian,
+Balinese,
+Bengali,
+Bopomofo,
+Braille,
+Buginese,
+Buhid,
+Canadian_Aboriginal,
+Cherokee,
+Common,
+Coptic,
+Cuneiform,
+Cypriot,
+Cyrillic,
+Deseret,
+Devanagari,
+Ethiopic,
+Georgian,
+Glagolitic,
+Gothic,
+Greek,
+Gujarati,
+Gurmukhi,
+Han,
+Hangul,
+Hanunoo,
+Hebrew,
+Hiragana,
+Inherited,
+Kannada,
+Katakana,
+Kharoshthi,
+Khmer,
+Lao,
+Latin,
+Limbu,
+Linear_B,
+Malayalam,
+Mongolian,
+Myanmar,
+New_Tai_Lue,
+Nko,
+Ogham,
+Old_Italic,
+Old_Persian,
+Oriya,
+Osmanya,
+Phags_Pa,
+Phoenician,
+Runic,
+Shavian,
+Sinhala,
+Syloti_Nagri,
+Syriac,
+Tagalog,
+Tagbanwa,
+Tai_Le,
+Tamil,
+Telugu,
+Thaana,
+Thai,
+Tibetan,
+Tifinagh,
+Ugaritic,
+Yi.
+</P>
+<br><a name="SEC7" href="#TOC1">CHARACTER CLASSES</a><br>
+<P>
+<pre>
+  [...]       positive character class
+  [^...]      negative character class
+  [x-y]       range (can be used for hex characters)
+  [[:xxx:]]   positive POSIX named set
+  [[^:xxx:]]  negative POSIX named set
+
+  alnum       alphanumeric
+  alpha       alphabetic
+  ascii       0-127
+  blank       space or tab
+  cntrl       control character
+  digit       decimal digit
+  graph       printing, excluding space
+  lower       lower case letter
+  print       printing, including space
+  punct       printing, excluding alphanumeric
+  space       whitespace
+  upper       upper case letter
+  word        same as \w
+  xdigit      hexadecimal digit
+</pre>
+In PCRE, POSIX character set names recognize only ASCII characters. You can use
+\Q...\E inside a character class.
+</P>
+<br><a name="SEC8" href="#TOC1">QUANTIFIERS</a><br>
+<P>
+<pre>
+  ?           0 or 1, greedy
+  ?+          0 or 1, possessive
+  ??          0 or 1, lazy
+  *           0 or more, greedy
+  *+          0 or more, possessive
+  *?          0 or more, lazy
+  +           1 or more, greedy
+  ++          1 or more, possessive
+  +?          1 or more, lazy
+  {n}         exactly n
+  {n,m}       at least n, no more than m, greedy
+  {n,m}+      at least n, no more than m, possessive
+  {n,m}?      at least n, no more than m, lazy
+  {n,}        n or more, greedy
+  {n,}+       n or more, possessive
+  {n,}?       n or more, lazy
+</PRE>
+</P>
+<br><a name="SEC9" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
+<P>
+<pre>
+  \b          word boundary
+  \B          not a word boundary
+  ^           start of subject
+               also after internal newline in multiline mode
+  \A          start of subject
+  $           end of subject
+               also before newline at end of subject
+               also before internal newline in multiline mode
+  \Z          end of subject
+               also before newline at end of subject
+  \z          end of subject
+  \G          first matching position in subject
+</PRE>
+</P>
+<br><a name="SEC10" href="#TOC1">MATCH POINT RESET</a><br>
+<P>
+<pre>
+  \K          reset start of match
+</PRE>
+</P>
+<br><a name="SEC11" href="#TOC1">ALTERNATION</a><br>
+<P>
+<pre>
+  expr|expr|expr...
+</PRE>
+</P>
+<br><a name="SEC12" href="#TOC1">CAPTURING</a><br>
+<P>
+<pre>
+  (...)          capturing group
+  (?&#60;name&#62;...)   named capturing group (Perl)
+  (?'name'...)   named capturing group (Perl)
+  (?P&#60;name&#62;...)  named capturing group (Python)
+  (?:...)        non-capturing group
+  (?|...)        non-capturing group; reset group numbers for
+                  capturing groups in each alternative
+</PRE>
+</P>
+<br><a name="SEC13" href="#TOC1">ATOMIC GROUPS</a><br>
+<P>
+<pre>
+  (?&#62;...)        atomic, non-capturing group
+</PRE>
+</P>
+<br><a name="SEC14" href="#TOC1">COMMENT</a><br>
+<P>
+<pre>
+  (?#....)       comment (not nestable)
+</PRE>
+</P>
+<br><a name="SEC15" href="#TOC1">OPTION SETTING</a><br>
+<P>
+<pre>
+  (?i)           caseless
+  (?J)           allow duplicate names
+  (?m)           multiline
+  (?s)           single line (dotall)
+  (?U)           default ungreedy (lazy)
+  (?x)           extended (ignore white space)
+  (?-...)        unset option(s)
+</PRE>
+</P>
+<br><a name="SEC16" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
+<P>
+<pre>
+  (?=...)        positive look ahead
+  (?!...)        negative look ahead
+  (?&#60;=...)       positive look behind
+  (?&#60;!...)       negative look behind
+</pre>
+Each top-level branch of a look behind must be of a fixed length.
+</P>
+<br><a name="SEC17" href="#TOC1">BACKREFERENCES</a><br>
+<P>
+<pre>
+  \n             reference by number (can be ambiguous)
+  \gn            reference by number
+  \g{n}          reference by number
+  \g{-n}         relative reference by number
+  \k&#60;name&#62;       reference by name (Perl)
+  \k'name'       reference by name (Perl)
+  \g{name}       reference by name (Perl)
+  \k{name}       reference by name (.NET)
+  (?P=name)      reference by name (Python)
+</PRE>
+</P>
+<br><a name="SEC18" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
+<P>
+<pre>
+  (?R)           recurse whole pattern
+  (?n)           call subpattern by absolute number
+  (?+n)          call subpattern by relative number
+  (?-n)          call subpattern by relative number
+  (?&name)       call subpattern by name (Perl)
+  (?P&#62;name)      call subpattern by name (Python)
+</PRE>
+</P>
+<br><a name="SEC19" href="#TOC1">CONDITIONAL PATTERNS</a><br>
+<P>
+<pre>
+  (?(condition)yes-pattern)
+  (?(condition)yes-pattern|no-pattern)
+
+  (?(n)...       absolute reference condition
+  (?(+n)...      relative reference condition
+  (?(-n)...      relative reference condition
+  (?(&#60;name&#62;)...  named reference condition (Perl)
+  (?('name')...  named reference condition (Perl)
+  (?(name)...    named reference condition (PCRE)
+  (?(R)...       overall recursion condition
+  (?(Rn)...      specific group recursion condition
+  (?(R&name)...  specific recursion condition
+  (?(DEFINE)...  define subpattern for reference
+  (?(assert)...  assertion condition
+</PRE>
+</P>
+<br><a name="SEC20" href="#TOC1">BACKTRACKING CONTROL</a><br>
+<P>
+The following act immediately they are reached:
+<pre>
+  (*ACCEPT)      force successful match
+  (*FAIL)        force backtrack; synonym (*F)
+</pre>
+The following act only when a subsequent match failure causes a backtrack to
+reach them. They all force a match failure, but they differ in what happens
+afterwards. Those that advance the start-of-match point do so only if the
+pattern is not anchored.
+<pre>
+  (*COMMIT)      overall failure, no advance of starting point
+  (*PRUNE)       advance to next starting character
+  (*SKIP)        advance start to current matching position
+  (*THEN)        local failure, backtrack to next alternation
+</PRE>
+</P>
+<br><a name="SEC21" href="#TOC1">NEWLINE CONVENTIONS</a><br>
+<P>
+These are recognized only at the very start of the pattern or after a
+(*BSR_...) option.
+<pre>
+  (*CR)
+  (*LF)
+  (*CRLF)
+  (*ANYCRLF)
+  (*ANY)
+</PRE>
+</P>
+<br><a name="SEC22" href="#TOC1">WHAT \R MATCHES</a><br>
+<P>
+These are recognized only at the very start of the pattern or after a
+(*...) option that sets the newline convention.
+<pre>
+  (*BSR_ANYCRLF)
+  (*BSR_UNICODE)
+</PRE>
+</P>
+<br><a name="SEC23" href="#TOC1">CALLOUTS</a><br>
+<P>
+<pre>
+  (?C)      callout
+  (?Cn)     callout with data n
+</PRE>
+</P>
+<br><a name="SEC24" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcrepattern</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3),
+<b>pcrematching</b>(3), <b>pcre</b>(3).
+</P>
+<br><a name="SEC25" href="#TOC1">AUTHOR</a><br>
+<P>
+Philip Hazel
+<br>
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
+<br>
+</P>
+<br><a name="SEC26" href="#TOC1">REVISION</a><br>
+<P>
+Last updated: 21 September 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
+<br>
+<p>
+Return to the <a href="index.html">PCRE index page</a>.
+</p>

Modified: httpd/httpd/vendor/pcre/current/doc/html/pcretest.html
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/html/pcretest.html?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/html/pcretest.html (original)
+++ httpd/httpd/vendor/pcre/current/doc/html/pcretest.html Mon Nov 26 08:49:53 2007
@@ -18,17 +18,22 @@
 <li><a name="TOC3" href="#SEC3">DESCRIPTION</a>
 <li><a name="TOC4" href="#SEC4">PATTERN MODIFIERS</a>
 <li><a name="TOC5" href="#SEC5">DATA LINES</a>
-<li><a name="TOC6" href="#SEC6">OUTPUT FROM PCRETEST</a>
-<li><a name="TOC7" href="#SEC7">CALLOUTS</a>
-<li><a name="TOC8" href="#SEC8">SAVING AND RELOADING COMPILED PATTERNS</a>
-<li><a name="TOC9" href="#SEC9">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC7" href="#SEC7">DEFAULT OUTPUT FROM PCRETEST</a>
+<li><a name="TOC8" href="#SEC8">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a>
+<li><a name="TOC9" href="#SEC9">RESTARTING AFTER A PARTIAL MATCH</a>
+<li><a name="TOC10" href="#SEC10">CALLOUTS</a>
+<li><a name="TOC11" href="#SEC11">NON-PRINTING CHARACTERS</a>
+<li><a name="TOC12" href="#SEC12">SAVING AND RELOADING COMPILED PATTERNS</a>
+<li><a name="TOC13" href="#SEC13">SEE ALSO</a>
+<li><a name="TOC14" href="#SEC14">AUTHOR</a>
+<li><a name="TOC15" href="#SEC15">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
-<b>pcretest [-C] [-d] [-i] [-m] [-o osize] [-p] [-t] [source]</b>
-<b>[destination]</b>
-</P>
-<P>
+<b>pcretest [options] [source] [destination]</b>
+<br>
+<br>
 <b>pcretest</b> was written as a test program for the PCRE regular expression
 library itself, but it can also be used for experimenting with regular
 expressions. This document describes the features of the test program; for
@@ -41,18 +46,34 @@
 </P>
 <br><a name="SEC2" href="#TOC1">OPTIONS</a><br>
 <P>
+<b>-b</b>
+Behave as if each regex has the <b>/B</b> (show bytecode) modifier; the internal
+form is output after compilation.
+</P>
+<P>
 <b>-C</b>
 Output the version number of the PCRE library, and all available information
 about the optional features that are included, and then exit.
 </P>
 <P>
 <b>-d</b>
-Behave as if each regex had the <b>/D</b> (debug) modifier; the internal
-form is output after compilation.
+Behave as if each regex has the <b>/D</b> (debug) modifier; the internal
+form and information about the compiled pattern is output after compilation;
+<b>-d</b> is equivalent to <b>-b -i</b>.
+</P>
+<P>
+<b>-dfa</b>
+Behave as if each data line contains the \D escape sequence; this causes the
+alternative matching function, <b>pcre_dfa_exec()</b>, to be used instead of the
+standard <b>pcre_exec()</b> function (more detail is given below).
+</P>
+<P>
+<b>-help</b>
+Output a brief summary these options and then exit.
 </P>
 <P>
 <b>-i</b>
-Behave as if each regex had the <b>/I</b> modifier; information about the
+Behave as if each regex has the <b>/I</b> modifier; information about the
 compiled pattern is given after compilation.
 </P>
 <P>
@@ -64,21 +85,41 @@
 <P>
 <b>-o</b> <i>osize</i>
 Set the number of elements in the output vector that is used when calling
-<b>pcre_exec()</b> to be <i>osize</i>. The default value is 45, which is enough
-for 14 capturing subexpressions. The vector size can be changed for individual
-matching calls by including \O in the data line (see below).
+<b>pcre_exec()</b> or <b>pcre_dfa_exec()</b> to be <i>osize</i>. The default value
+is 45, which is enough for 14 capturing subexpressions for <b>pcre_exec()</b> or
+22 different matches for <b>pcre_dfa_exec()</b>. The vector size can be
+changed for individual matching calls by including \O in the data line (see
+below).
 </P>
 <P>
 <b>-p</b>
-Behave as if each regex has <b>/P</b> modifier; the POSIX wrapper API is used
-to call PCRE. None of the other options has any effect when <b>-p</b> is set.
+Behave as if each regex has the <b>/P</b> modifier; the POSIX wrapper API is
+used to call PCRE. None of the other options has any effect when <b>-p</b> is
+set.
+</P>
+<P>
+<b>-q</b>
+Do not output the version number of <b>pcretest</b> at the start of execution.
+</P>
+<P>
+<b>-S</b> <i>size</i>
+On Unix-like systems, set the size of the runtime stack to <i>size</i>
+megabytes.
 </P>
 <P>
 <b>-t</b>
 Run each compile, study, and match many times with a timer, and output
 resulting time per compile or match (in milliseconds). Do not set <b>-m</b> with
 <b>-t</b>, because you will then get the size output a zillion times, and the
-timing will be distorted.
+timing will be distorted. You can control the number of iterations that are
+used for timing by following <b>-t</b> with a number (as a separate item on the
+command line). For example, "-t 1000" would iterate 1000 times. The default is
+to iterate 500000 times.
+</P>
+<P>
+<b>-tm</b>
+This is like <b>-t</b> except that it times only the matching phase, not the
+compile or study phases.
 </P>
 <br><a name="SEC3" href="#TOC1">DESCRIPTION</a><br>
 <P>
@@ -95,14 +136,15 @@
 </P>
 <P>
 Each data line is matched separately and independently. If you want to do
-multiple-line matches, you have to use the \n escape sequence in a single line
-of input to encode the newline characters. The maximum length of data line is
-30,000 characters.
+multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
+etc., depending on the newline setting) in a single line of input to encode the
+newline sequences. There is no limit on the length of data lines; the input
+buffer is automatically extended if it is too small.
 </P>
 <P>
 An empty line signals the end of the data lines, at which point a new regular
 expression is read. The regular expressions are given enclosed in any
-non-alphanumeric delimiters other than backslash, for example
+non-alphanumeric delimiters other than backslash, for example:
 <pre>
   /(a|bc)x+yz/
 </pre>
@@ -149,13 +191,36 @@
 The following table shows additional modifiers for setting PCRE options that do
 not correspond to anything in Perl:
 <pre>
-  <b>/A</b>    PCRE_ANCHORED
-  <b>/C</b>    PCRE_AUTO_CALLOUT
-  <b>/E</b>    PCRE_DOLLAR_ENDONLY
-  <b>/N</b>    PCRE_NO_AUTO_CAPTURE
-  <b>/U</b>    PCRE_UNGREEDY
-  <b>/X</b>    PCRE_EXTRA
+  <b>/A</b>              PCRE_ANCHORED
+  <b>/C</b>              PCRE_AUTO_CALLOUT
+  <b>/E</b>              PCRE_DOLLAR_ENDONLY
+  <b>/f</b>              PCRE_FIRSTLINE
+  <b>/J</b>              PCRE_DUPNAMES
+  <b>/N</b>              PCRE_NO_AUTO_CAPTURE
+  <b>/U</b>              PCRE_UNGREEDY
+  <b>/X</b>              PCRE_EXTRA
+  <b>/&#60;cr&#62;</b>           PCRE_NEWLINE_CR
+  <b>/&#60;lf&#62;</b>           PCRE_NEWLINE_LF
+  <b>/&#60;crlf&#62;</b>         PCRE_NEWLINE_CRLF
+  <b>/&#60;anycrlf&#62;</b>      PCRE_NEWLINE_ANYCRLF
+  <b>/&#60;any&#62;</b>          PCRE_NEWLINE_ANY
+  <b>/&#60;bsr_anycrlf&#62;</b>  PCRE_BSR_ANYCRLF
+  <b>/&#60;bsr_unicode&#62;</b>  PCRE_BSR_UNICODE
+</pre>
+Those specifying line ending sequences are literal strings as shown, but the
+letters can be in either case. This example sets multiline matching with CRLF
+as the line ending sequence:
+<pre>
+  /^abc/m&#60;crlf&#62;
 </pre>
+Details of the meanings of these PCRE options are given in the
+<a href="pcreapi.html"><b>pcreapi</b></a>
+documentation.
+</P>
+<br><b>
+Finding all matches in a string
+</b><br>
+<P>
 Searching for all possible matches within each subject string can be requested
 by the <b>/g</b> or <b>/G</b> modifier. After finding a match, PCRE is called
 again to search the remainder of the subject string. The difference between
@@ -173,6 +238,9 @@
 match is retried. This imitates the way Perl handles such cases when using the
 <b>/g</b> modifier or the <b>split()</b> function.
 </P>
+<br><b>
+Other modifiers
+</b><br>
 <P>
 There are yet more modifiers for controlling the way <b>pcretest</b>
 operates.
@@ -184,6 +252,14 @@
 multiple copies of the same substring.
 </P>
 <P>
+The <b>/B</b> modifier is a debugging feature. It requests that <b>pcretest</b>
+output a representation of the compiled byte code after compilation. Normally
+this information contains length and offset values; however, if <b>/Z</b> is
+also present, this data is replaced by spaces. This is a special feature for
+use in the automatic test scripts; it ensures that the same output is generated
+for different internal link sizes.
+</P>
+<P>
 The <b>/L</b> modifier must be followed directly by the name of a locale, for
 example,
 <pre>
@@ -202,10 +278,8 @@
 pattern. If the pattern is studied, the results of that are also output.
 </P>
 <P>
-The <b>/D</b> modifier is a PCRE debugging feature, which also assumes <b>/I</b>.
-It causes the internal form of compiled regular expressions to be output after
-compilation. If the pattern was studied, the information returned is also
-output.
+The <b>/D</b> modifier is a PCRE debugging feature, and is equivalent to
+<b>/BI</b>, that is, both the <b>/B</b> and the <b>/I</b> modifiers.
 </P>
 <P>
 The <b>/F</b> modifier causes <b>pcretest</b> to flip the byte order of the
@@ -253,19 +327,20 @@
 expressions, you probably don't need any of these. The following escapes are
 recognized:
 <pre>
-  \a         alarm (= BEL)
-  \b         backspace
-  \e         escape
-  \f         formfeed
-  \n         newline
-  \r         carriage return
-  \t         tab
-  \v         vertical tab
+  \a         alarm (BEL, \x07)
+  \b         backspace (\x08)
+  \e         escape (\x27)
+  \f         formfeed (\x0c)
+  \n         newline (\x0a)
+  \qdd       set the PCRE_MATCH_LIMIT limit to dd (any number of digits)
+  \r         carriage return (\x0d)
+  \t         tab (\x09)
+  \v         vertical tab (\x0b)
   \nnn       octal character (up to 3 octal digits)
   \xhh       hexadecimal character (up to 2 hex digits)
   \x{hh...}  hexadecimal character, any number of digits in UTF-8 mode
-  \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b>
-  \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b>
+  \A         pass the PCRE_ANCHORED option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \B         pass the PCRE_NOTBOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \Cdd       call pcre_copy_substring() for substring dd after a successful match (number less than 32)
   \Cname     call pcre_copy_named_substring() for substring "name" after a successful match (name termin-
                ated by next non alphanumeric character)
@@ -274,33 +349,50 @@
   \C!n       return 1 instead of 0 when callout number n is reached
   \C!n!m     return 1 instead of 0 when callout number n is reached for the nth time
   \C*n       pass the number n (may be negative) as callout data; this is used as the callout return value
+  \D         use the <b>pcre_dfa_exec()</b> match function
+  \F         only shortest match for <b>pcre_dfa_exec()</b>
   \Gdd       call pcre_get_substring() for substring dd after a successful match (number less than 32)
   \Gname     call pcre_get_named_substring() for substring "name" after a successful match (name termin-
                ated by next non-alphanumeric character)
   \L         call pcre_get_substringlist() after a successful match
-  \M         discover the minimum MATCH_LIMIT setting
-  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b>
+  \M         discover the minimum MATCH_LIMIT and MATCH_LIMIT_RECURSION settings
+  \N         pass the PCRE_NOTEMPTY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \Odd       set the size of the output vector passed to <b>pcre_exec()</b> to dd (any number of digits)
-  \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b>
+  \P         pass the PCRE_PARTIAL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \Qdd       set the PCRE_MATCH_LIMIT_RECURSION limit to dd (any number of digits)
+  \R         pass the PCRE_DFA_RESTART option to <b>pcre_dfa_exec()</b>
   \S         output details of memory get/free calls during matching
-  \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b>
-  \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b>
+  \Z         pass the PCRE_NOTEOL option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \?         pass the PCRE_NO_UTF8_CHECK option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
   \&#62;dd       start the match at offset dd (any number of digits);
-               this sets the <i>startoffset</i> argument for <b>pcre_exec()</b>
-</pre>
-A backslash followed by anything else just escapes the anything else. If the
-very last character is a backslash, it is ignored. This gives a way of passing
-an empty line as data, since a real empty line terminates the data input.
+               this sets the <i>startoffset</i> argument for <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \&#60;cr&#62;      pass the PCRE_NEWLINE_CR option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \&#60;lf&#62;      pass the PCRE_NEWLINE_LF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \&#60;crlf&#62;    pass the PCRE_NEWLINE_CRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \&#60;anycrlf&#62; pass the PCRE_NEWLINE_ANYCRLF option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+  \&#60;any&#62;     pass the PCRE_NEWLINE_ANY option to <b>pcre_exec()</b> or <b>pcre_dfa_exec()</b>
+</pre>
+The escapes that specify line ending sequences are literal strings, exactly as
+shown. No more than one newline setting should be present in any data line.
+</P>
+<P>
+A backslash followed by anything else just escapes the anything else. If
+the very last character is a backslash, it is ignored. This gives a way of
+passing an empty line as data, since a real empty line terminates the data
+input.
 </P>
 <P>
 If \M is present, <b>pcretest</b> calls <b>pcre_exec()</b> several times, with
-different values in the <i>match_limit</i> field of the <b>pcre_extra</b> data
-structure, until it finds the minimum number that is needed for
-<b>pcre_exec()</b> to complete. This number is a measure of the amount of
-recursion and backtracking that takes place, and checking it out can be
-instructive. For most simple matches, the number is quite small, but for
-patterns with very large numbers of matching possibilities, it can become large
-very quickly with increasing length of subject string.
+different values in the <i>match_limit</i> and <i>match_limit_recursion</i>
+fields of the <b>pcre_extra</b> data structure, until it finds the minimum
+numbers for each parameter that allow <b>pcre_exec()</b> to complete. The
+<i>match_limit</i> number is a measure of the amount of backtracking that takes
+place, and checking it out can be instructive. For most simple matches, the
+number is quite small, but for patterns with very large numbers of matching
+possibilities, it can become large very quickly with increasing length of
+subject string. The <i>match_limit_recursion</i> number is a measure of how much
+stack (or, if PCRE is compiled with NO_RECURSE, how much heap) memory is needed
+to complete the match attempt.
 </P>
 <P>
 When \O is used, the value specified may be higher or lower than the size set
@@ -309,26 +401,51 @@
 </P>
 <P>
 If the <b>/P</b> modifier was present on the pattern, causing the POSIX wrapper
-API to be used, only \B and \Z have any effect, causing REG_NOTBOL and
-REG_NOTEOL to be passed to <b>regexec()</b> respectively.
+API to be used, the only option-setting sequences that have any effect are \B
+and \Z, causing REG_NOTBOL and REG_NOTEOL, respectively, to be passed to
+<b>regexec()</b>.
 </P>
 <P>
 The use of \x{hh...} to represent UTF-8 characters is not dependent on the use
 of the <b>/8</b> modifier on the pattern. It is recognized always. There may be
 any number of hexadecimal digits inside the braces. The result is from one to
-six bytes, encoded according to the UTF-8 rules.
+six bytes, encoded according to the original UTF-8 rules of RFC 2279. This
+allows for values in the range 0 to 0x7FFFFFFF. Note that not all of those are
+valid Unicode code points, or indeed valid UTF-8 characters according to the
+later rules in RFC 3629.
+</P>
+<br><a name="SEC6" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+By default, <b>pcretest</b> uses the standard PCRE matching function,
+<b>pcre_exec()</b> to match each data line. From release 6.0, PCRE supports an
+alternative matching function, <b>pcre_dfa_test()</b>, which operates in a
+different way, and has some restrictions. The differences between the two
+functions are described in the
+<a href="pcrematching.html"><b>pcrematching</b></a>
+documentation.
+</P>
+<P>
+If a data line contains the \D escape sequence, or if the command line
+contains the <b>-dfa</b> option, the alternative matching function is called.
+This function finds all possible matches at a given point. If, however, the \F
+escape sequence is present in the data line, it stops after the first match is
+found. This is always the shortest possible match.
+</P>
+<br><a name="SEC7" href="#TOC1">DEFAULT OUTPUT FROM PCRETEST</a><br>
+<P>
+This section describes the output when the normal matching function,
+<b>pcre_exec()</b>, is being used.
 </P>
-<br><a name="SEC6" href="#TOC1">OUTPUT FROM PCRETEST</a><br>
 <P>
 When a match succeeds, pcretest outputs the list of captured substrings that
 <b>pcre_exec()</b> returns, starting with number 0 for the string that matched
 the whole pattern. Otherwise, it outputs "No match" or "Partial match"
 when <b>pcre_exec()</b> returns PCRE_ERROR_NOMATCH or PCRE_ERROR_PARTIAL,
 respectively, and otherwise the PCRE negative error number. Here is an example
-of an interactive pcretest run.
+of an interactive <b>pcretest</b> run.
 <pre>
   $ pcretest
-  PCRE version 5.00 07-Sep-2004
+  PCRE version 7.0 30-Nov-2006
 
     re&#62; /^abc(\d+)/
   data&#62; abc123
@@ -339,9 +456,9 @@
 </pre>
 If the strings contain any non-printing characters, they are output as \0x
 escapes, or as \x{...} escapes if the <b>/8</b> modifier was present on the
-pattern. If the pattern has the <b>/+</b> modifier, the output for substring 0
-is followed by the the rest of the subject string, identified by "0+" like
-this:
+pattern. See below for the definition of non-printing characters. If the
+pattern has the <b>/+</b> modifier, the output for substring 0 is followed by
+the the rest of the subject string, identified by "0+" like this:
 <pre>
     re&#62; /cat/+
   data&#62; cataract
@@ -371,16 +488,67 @@
 parentheses after each string for <b>\C</b> and <b>\G</b>.
 </P>
 <P>
-Note that while patterns can be continued over several lines (a plain "&#62;"
+Note that whereas patterns can be continued over several lines (a plain "&#62;"
 prompt is used for continuations), data lines may not. However newlines can be
-included in data by means of the \n escape.
+included in data by means of the \n escape (or \r, \r\n, etc., depending on
+the newline sequence setting).
 </P>
-<br><a name="SEC7" href="#TOC1">CALLOUTS</a><br>
+<br><a name="SEC8" href="#TOC1">OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION</a><br>
+<P>
+When the alternative matching function, <b>pcre_dfa_exec()</b>, is used (by
+means of the \D escape sequence or the <b>-dfa</b> command line option), the
+output consists of a list of all the matches that start at the first point in
+the subject where there is at least one match. For example:
+<pre>
+    re&#62; /(tang|tangerine|tan)/
+  data&#62; yellow tangerine\D
+   0: tangerine
+   1: tang
+   2: tan
+</pre>
+(Using the normal matching function on this data finds only "tang".) The
+longest matching string is always given first (and numbered zero).
+</P>
+<P>
+If <b>/g</b> is present on the pattern, the search for further matches resumes
+at the end of the longest match. For example:
+<pre>
+    re&#62; /(tang|tangerine|tan)/g
+  data&#62; yellow tangerine and tangy sultana\D
+   0: tangerine
+   1: tang
+   2: tan
+   0: tang
+   1: tan
+   0: tan
+</pre>
+Since the matching function does not support substring capture, the escape
+sequences that are concerned with captured substrings are not relevant.
+</P>
+<br><a name="SEC9" href="#TOC1">RESTARTING AFTER A PARTIAL MATCH</a><br>
+<P>
+When the alternative matching function has given the PCRE_ERROR_PARTIAL return,
+indicating that the subject partially matched the pattern, you can restart the
+match with additional subject data by means of the \R escape sequence. For
+example:
+<pre>
+    re&#62; /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
+  data&#62; 23ja\P\D
+  Partial match: 23ja
+  data&#62; n05\R\D
+   0: n05
+</pre>
+For further information about partial matching, see the
+<a href="pcrepartial.html"><b>pcrepartial</b></a>
+documentation.
+</P>
+<br><a name="SEC10" href="#TOC1">CALLOUTS</a><br>
 <P>
 If the pattern contains any callout requests, <b>pcretest</b>'s callout function
-is called during matching. By default, it displays the callout number, the
-start and current positions in the text at the callout time, and the next
-pattern item to be tested. For example, the output
+is called during matching. This works with both matching functions. By default,
+the called function displays the callout number, the start and current
+positions in the text at the callout time, and the next pattern item to be
+tested. For example, the output
 <pre>
   ---&#62;pqrabcdef
     0    ^  ^     \d
@@ -406,7 +574,7 @@
    0: E*
 </pre>
 The callout function in <b>pcretest</b> returns zero (carry on matching) by
-default, but you can use an \C item in a data line (as described above) to
+default, but you can use a \C item in a data line (as described above) to
 change this.
 </P>
 <P>
@@ -416,7 +584,19 @@
 <a href="pcrecallout.html"><b>pcrecallout</b></a>
 documentation.
 </P>
-<br><a name="SEC8" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
+<br><a name="SEC11" href="#TOC1">NON-PRINTING CHARACTERS</a><br>
+<P>
+When <b>pcretest</b> is outputting text in the compiled version of a pattern,
+bytes other than 32-126 are always treated as non-printing characters are are
+therefore shown as hex escapes.
+</P>
+<P>
+When <b>pcretest</b> is outputting text that is a matched part of a subject
+string, it behaves in the same way, unless a different locale has been set for
+the pattern (using the <b>/L</b> modifier). In this case, the <b>isprint()</b>
+function to distinguish printing and non-printing characters.
+</P>
+<br><a name="SEC12" href="#TOC1">SAVING AND RELOADING COMPILED PATTERNS</a><br>
 <P>
 The facilities described in this section are not available when the POSIX
 inteface to PCRE is being used, that is, when the <b>/P</b> pattern modifier is
@@ -478,18 +658,26 @@
 Finally, if you attempt to load a file that is not in the correct format, the
 result is undefined.
 </P>
-<br><a name="SEC9" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
+<P>
+<b>pcre</b>(3), <b>pcreapi</b>(3), <b>pcrecallout</b>(3), <b>pcrematching</b>(3),
+<b>pcrepartial</b>(d), <b>pcrepattern</b>(3), <b>pcreprecompile</b>(3).
+</P>
+<br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
 <P>
-Philip Hazel &#60;ph10@cam.ac.uk&#62;
+Philip Hazel
 <br>
-University Computing Service,
+University Computing Service
+<br>
+Cambridge CB2 3QH, England.
 <br>
-Cambridge CB2 3QG, England.
 </P>
+<br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 10 September 2004
+Last updated: 11 September 2007
+<br>
+Copyright &copy; 1997-2007 University of Cambridge.
 <br>
-Copyright &copy; 1997-2004 University of Cambridge.
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
 </p>

Added: httpd/httpd/vendor/pcre/current/doc/index.html.src
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/index.html.src?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/index.html.src (added)
+++ httpd/httpd/vendor/pcre/current/doc/index.html.src Mon Nov 26 08:49:53 2007
@@ -0,0 +1,140 @@
+<html>
+<!-- This is a manually maintained file that is the root of the HTML version of 
+     the PCRE documentation. When the HTML documents are built from the man 
+     page versions, the entire doc/html directory is emptied, this file is then 
+     copied into doc/html/index.html, and the remaining files therein are 
+     created by the 132html script.
+-->      
+<head>
+<title>PCRE specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>Perl-compatible Regular Expressions (PCRE)</h1>
+<p>
+The HTML documentation for PCRE comprises the following pages:
+</p>
+
+<table>
+<tr><td><a href="pcre.html">pcre</a></td>
+    <td>&nbsp;&nbsp;Introductory page</td></tr>
+
+<tr><td><a href="pcre-config.html">pcre-config</a></td>
+    <td>&nbsp;&nbsp;Information about the installation configuration</td></tr>
+
+<tr><td><a href="pcreapi.html">pcreapi</a></td>
+    <td>&nbsp;&nbsp;PCRE's native API</td></tr>
+
+<tr><td><a href="pcrebuild.html">pcrebuild</a></td>
+    <td>&nbsp;&nbsp;Options for building PCRE</td></tr>
+
+<tr><td><a href="pcrecallout.html">pcrecallout</a></td>
+    <td>&nbsp;&nbsp;The <i>callout</i> facility</td></tr>
+
+<tr><td><a href="pcrecompat.html">pcrecompat</a></td>
+    <td>&nbsp;&nbsp;Compability with Perl</td></tr>
+
+<tr><td><a href="pcrecpp.html">pcrecpp</a></td>
+    <td>&nbsp;&nbsp;The C++ wrapper for the PCRE library</td></tr>
+
+<tr><td><a href="pcregrep.html">pcregrep</a></td>
+    <td>&nbsp;&nbsp;The <b>pcregrep</b> command</td></tr>
+
+<tr><td><a href="pcrematching.html">pcrematching</a></td>
+    <td>&nbsp;&nbsp;Discussion of the two matching algorithms</td></tr>
+
+<tr><td><a href="pcrepartial.html">pcrepartial</a></td>
+    <td>&nbsp;&nbsp;Using PCRE for partial matching</td></tr>
+
+<tr><td><a href="pcrepattern.html">pcrepattern</a></td>
+    <td>&nbsp;&nbsp;Specification of the regular expressions supported by PCRE</td></tr>
+
+<tr><td><a href="pcreperform.html">pcreperform</a></td>
+    <td>&nbsp;&nbsp;Some comments on performance</td></tr>
+
+<tr><td><a href="pcreposix.html">pcreposix</a></td>
+    <td>&nbsp;&nbsp;The POSIX API to the PCRE library</td></tr>
+
+<tr><td><a href="pcreprecompile.html">pcreprecompile</a></td>
+    <td>&nbsp;&nbsp;How to save and re-use compiled patterns</td></tr>
+
+<tr><td><a href="pcresample.html">pcresample</a></td>
+    <td>&nbsp;&nbsp;Description of the sample program</td></tr>
+
+<tr><td><a href="pcrestack.html">pcrestack</a></td>
+    <td>&nbsp;&nbsp;Discussion of PCRE's stack usage</td></tr>
+
+<tr><td><a href="pcresyntax.html">pcresyntax</a></td>
+    <td>&nbsp;&nbsp;Syntax quick-reference summary</td></tr>
+
+<tr><td><a href="pcretest.html">pcretest</a></td>
+    <td>&nbsp;&nbsp;The <b>pcretest</b> command for testing PCRE</td></tr>
+</table>
+
+<p>
+There are also individual pages that summarize the interface for each function 
+in the library:
+</p>
+
+<table>    
+
+<tr><td><a href="pcre_compile.html">pcre_compile</a></td>
+    <td>&nbsp;&nbsp;Compile a regular expression</td></tr>
+
+<tr><td><a href="pcre_compile2.html">pcre_compile2</a></td>
+    <td>&nbsp;&nbsp;Compile a regular expression (alternate interface)</td></tr>
+
+<tr><td><a href="pcre_config.html">pcre_config</a></td>
+    <td>&nbsp;&nbsp;Show build-time configuration options</td></tr>
+
+<tr><td><a href="pcre_copy_named_substring.html">pcre_copy_named_substring</a></td>
+    <td>&nbsp;&nbsp;Extract named substring into given buffer</td></tr>
+
+<tr><td><a href="pcre_copy_substring.html">pcre_copy_substring</a></td>
+    <td>&nbsp;&nbsp;Extract numbered substring into given buffer</td></tr>
+
+<tr><td><a href="pcre_dfa_exec.html">pcre_dfa_exec</a></td>
+    <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
+    (DFA algorithm; <i>not</i> Perl compatible)</td></tr>
+
+<tr><td><a href="pcre_exec.html">pcre_exec</a></td>
+    <td>&nbsp;&nbsp;Match a compiled pattern to a subject string
+    (Perl compatible)</td></tr>
+
+<tr><td><a href="pcre_free_substring.html">pcre_free_substring</a></td>
+    <td>&nbsp;&nbsp;Free extracted substring</td></tr>
+
+<tr><td><a href="pcre_free_substring_list.html">pcre_free_substring_list</a></td>
+    <td>&nbsp;&nbsp;Free list of extracted substrings</td></tr>
+
+<tr><td><a href="pcre_fullinfo.html">pcre_fullinfo</a></td>
+    <td>&nbsp;&nbsp;Extract information about a pattern</td></tr>
+
+<tr><td><a href="pcre_get_named_substring.html">pcre_get_named_substring</a></td>
+    <td>&nbsp;&nbsp;Extract named substring into new memory</td></tr>
+
+<tr><td><a href="pcre_get_stringnumber.html">pcre_get_stringnumber</a></td>
+    <td>&nbsp;&nbsp;Convert captured string name to number</td></tr>
+
+<tr><td><a href="pcre_get_substring.html">pcre_get_substring</a></td>
+    <td>&nbsp;&nbsp;Extract numbered substring into new memory</td></tr>
+
+<tr><td><a href="pcre_get_substring_list.html">pcre_get_substring_list</a></td>
+    <td>&nbsp;&nbsp;Extract all substrings into new memory</td></tr>
+
+<tr><td><a href="pcre_info.html">pcre_info</a></td>
+    <td>&nbsp;&nbsp;Obsolete information extraction function</td></tr>
+
+<tr><td><a href="pcre_maketables.html">pcre_maketables</a></td>
+    <td>&nbsp;&nbsp;Build character tables in current locale</td></tr>
+    
+<tr><td><a href="pcre_refcount.html">pcre_refcount</a></td>
+    <td>&nbsp;&nbsp;Maintain reference count in compiled pattern</td></tr>
+
+<tr><td><a href="pcre_study.html">pcre_study</a></td>
+    <td>&nbsp;&nbsp;Study a compiled pattern</td></tr>
+
+<tr><td><a href="pcre_version.html">pcre_version</a></td>
+    <td>&nbsp;&nbsp;Return PCRE version and release date</td></tr>
+</table>
+
+</html>

Added: httpd/httpd/vendor/pcre/current/doc/pcre-config.1
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcre-config.1?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcre-config.1 (added)
+++ httpd/httpd/vendor/pcre/current/doc/pcre-config.1 Mon Nov 26 08:49:53 2007
@@ -0,0 +1,73 @@
+.TH PCRE-CONFIG 1
+.SH NAME
+pcre-config - program to return PCRE configuration
+.SH SYNOPSIS
+.rs
+.sp
+.B pcre-config  [--prefix] [--exec-prefix] [--version] [--libs]
+.ti +5n
+.B              [--libs-posix] [--cflags] [--cflags-posix]
+.
+.
+.SH DESCRIPTION
+.rs
+.sp
+\fBpcre-config\fP returns the configuration of the installed PCRE
+libraries and the options required to compile a program to use them.
+.
+.
+.SH OPTIONS
+.rs
+.TP 10
+\fB--prefix\fP
+Writes the directory prefix used in the PCRE installation for architecture
+independent files (\fI/usr\fP on many systems, \fI/usr/local\fP on some
+systems) to the standard output.
+.TP 10
+\fB--exec-prefix\fP
+Writes the directory prefix used in the PCRE installation for architecture
+dependent files (normally the same as \fB--prefix\fP) to the standard output.
+.TP 10
+\fB--version\fP
+Writes the version number of the installed PCRE libraries to the standard
+output.
+.TP 10
+\fB--libs\fP
+Writes to the standard output the command line options required to link
+with PCRE (\fB-lpcre\fP on many systems).
+.TP 10
+\fB--libs-posix\fP
+Writes to the standard output the command line options required to link with
+the PCRE posix emulation library (\fB-lpcreposix\fP \fB-lpcre\fP on many
+systems).
+.TP 10
+\fB--cflags\fP
+Writes to the standard output the command line options required to compile
+files that use PCRE (this may include some \fB-I\fP options, but is blank on
+many systems).
+.TP 10
+\fB--cflags-posix\fP
+Writes to the standard output the command line options required to compile
+files that use the PCRE posix emulation library (this may include some \fB-I\fP
+options, but is blank on many systems).
+.
+.
+.SH "SEE ALSO"
+.rs
+.sp
+\fBpcre(3)\fP
+.
+.
+.SH AUTHOR
+.rs
+.sp
+This manual page was originally written by Mark Baker for the Debian GNU/Linux
+system. It has been slightly revised as a generic PCRE man page.
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 18 April 2007
+.fi

Added: httpd/httpd/vendor/pcre/current/doc/pcre-config.txt
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcre-config.txt?rev=598339&view=auto
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcre-config.txt (added)
+++ httpd/httpd/vendor/pcre/current/doc/pcre-config.txt Mon Nov 26 08:49:53 2007
@@ -0,0 +1,67 @@
+PCRE-CONFIG(1)                                                  PCRE-CONFIG(1)
+
+
+
+NAME
+       pcre-config - program to return PCRE configuration
+
+SYNOPSIS
+
+       pcre-config [--prefix] [--exec-prefix] [--version] [--libs]
+            [--libs-posix] [--cflags] [--cflags-posix]
+
+
+DESCRIPTION
+
+       pcre-config  returns  the configuration of the installed PCRE libraries
+       and the options required to compile a program to use them.
+
+
+OPTIONS
+
+       --prefix  Writes the directory prefix used in the PCRE installation for
+                 architecture   independent   files  (/usr  on  many  systems,
+                 /usr/local on some systems) to the standard output.
+
+       --exec-prefix
+                 Writes the directory prefix used in the PCRE installation for
+                 architecture  dependent files (normally the same as --prefix)
+                 to the standard output.
+
+       --version Writes the version number of the installed PCRE libraries  to
+                 the standard output.
+
+       --libs    Writes  to  the  standard  output  the  command  line options
+                 required to link with PCRE (-lpcre on many systems).
+
+       --libs-posix
+                 Writes to  the  standard  output  the  command  line  options
+                 required  to  link  with  the  PCRE  posix  emulation library
+                 (-lpcreposix -lpcre on many systems).
+
+       --cflags  Writes to  the  standard  output  the  command  line  options
+                 required  to  compile  files  that use PCRE (this may include
+                 some -I options, but is blank on many systems).
+
+       --cflags-posix
+                 Writes to  the  standard  output  the  command  line  options
+                 required  to  compile files that use the PCRE posix emulation
+                 library (this may include some -I options, but  is  blank  on
+                 many systems).
+
+
+SEE ALSO
+
+       pcre(3)
+
+
+AUTHOR
+
+       This  manual  page  was originally written by Mark Baker for the Debian
+       GNU/Linux system. It has been slightly revised as a  generic  PCRE  man
+       page.
+
+
+REVISION
+
+       Last updated: 18 April 2007

Modified: httpd/httpd/vendor/pcre/current/doc/pcre.3
URL: http://svn.apache.org/viewvc/httpd/httpd/vendor/pcre/current/doc/pcre.3?rev=598339&r1=598338&r2=598339&view=diff
==============================================================================
--- httpd/httpd/vendor/pcre/current/doc/pcre.3 (original)
+++ httpd/httpd/vendor/pcre/current/doc/pcre.3 Mon Nov 26 08:49:53 2007
@@ -6,15 +6,33 @@
 .sp
 The PCRE library is a set of functions that implement regular expression
 pattern matching using the same syntax and semantics as Perl, with just a few
-differences. The current implementation of PCRE (release 5.x) corresponds
-approximately with Perl 5.8, including support for UTF-8 encoded strings and
-Unicode general category properties. However, this support has to be explicitly
-enabled; it is not the default.
+differences. (Certain features that appeared in Python and PCRE before they
+appeared in Perl are also available using the Python syntax.)
+.P
+The current implementation of PCRE (release 7.x) corresponds approximately with
+Perl 5.10, including support for UTF-8 encoded strings and Unicode general
+category properties. However, UTF-8 and Unicode support has to be explicitly
+enabled; it is not the default. The Unicode tables correspond to Unicode
+release 5.0.0.
+.P
+In addition to the Perl-compatible matching function, PCRE contains an
+alternative matching function that matches the same compiled patterns in a
+different way. In certain circumstances, the alternative function has some
+advantages. For a discussion of the two matching algorithms, see the
+.\" HREF
+\fBpcrematching\fP
+.\"
+page.
 .P
 PCRE is written in C and released as a C library. A number of people have
-written wrappers and interfaces of various kinds. A C++ class is included in
-these contributions, which can be found in the \fIContrib\fR directory at the
-primary FTP site, which is:
+written wrappers and interfaces of various kinds. In particular, Google Inc.
+have provided a comprehensive C++ wrapper. This is now included as part of the
+PCRE distribution. The
+.\" HREF
+\fBpcrecpp\fP
+.\"
+page has details of this interface. Other people's contributions can be found
+in the \fIContrib\fR directory at the primary FTP site, which is:
 .sp
 .\" HTML <a href="ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre">
 .\" </a>
@@ -29,7 +47,11 @@
 .\" HREF
 \fBpcrecompat\fR
 .\"
-pages.
+pages. There is a syntax summary in the
+.\" HREF
+\fBpcresyntax\fR
+.\"
+page.
 .P
 Some features of PCRE can be included, excluded, or changed when the library is
 built. The
@@ -43,6 +65,14 @@
 .\"
 page. Documentation about building PCRE for various operating systems can be
 found in the \fBREADME\fP file in the source distribution.
+.P
+The library contains a number of undocumented internal functions and data
+tables that are used by more than one of the exported external functions, but
+which are not intended for use by external callers. Their names all begin with
+"_pcre_", which hopefully will not provoke any name clashes. In some
+environments, it is possible to control which external symbols are exported
+when a shared library is built, and in these cases the undocumented symbols are
+not exported.
 .
 .
 .SH "USER DOCUMENTATION"
@@ -55,23 +85,28 @@
 follows:
 .sp
   pcre              this document
-  pcreapi           details of PCRE's native API
+  pcre-config       show PCRE installation configuration information
+  pcreapi           details of PCRE's native C API
   pcrebuild         options for building PCRE
   pcrecallout       details of the callout feature
   pcrecompat        discussion of Perl compatibility
+  pcrecpp           details of the C++ wrapper
   pcregrep          description of the \fBpcregrep\fP command
+  pcrematching      discussion of the two matching algorithms
   pcrepartial       details of the partial matching facility
 .\" JOIN
   pcrepattern       syntax and semantics of supported
                       regular expressions
+  pcresyntax        quick syntax reference
   pcreperform       discussion of performance issues
-  pcreposix         the POSIX-compatible API
+  pcreposix         the POSIX-compatible C API
   pcreprecompile    details of saving and re-using precompiled patterns
   pcresample        discussion of the sample program
+  pcrestack         discussion of stack usage
   pcretest          description of the \fBpcretest\fP testing command
 .sp
 In addition, in the "man" and HTML formats, there is a short page for each
-library function, listing its arguments and results.
+C library function, listing its arguments and results.
 .
 .
 .SH LIMITATIONS
@@ -89,20 +124,27 @@
 \fBpcrebuild\fP
 .\"
 documentation for details). In these cases the limit is substantially larger.
-However, the speed of execution will be slower.
+However, the speed of execution is slower.
 .P
 All values in repeating quantifiers must be less than 65536.
-The maximum number of capturing subpatterns is 65535.
 .P
-There is no limit to the number of non-capturing subpatterns, but the maximum
-depth of nesting of all kinds of parenthesized subpattern, including capturing
-subpatterns, assertions, and other types of subpattern, is 200.
+There is no limit to the number of parenthesized subpatterns, but there can be
+no more than 65535 capturing subpatterns.
+.P
+The maximum length of name for a named subpattern is 32 characters, and the
+maximum number of named subpatterns is 10000.
 .P
 The maximum length of a subject string is the largest positive number that an
-integer variable can hold. However, PCRE uses recursion to handle subpatterns
-and indefinite repetition. This means that the available stack space may limit
-the size of a subject string that can be processed by certain patterns.
-.sp
+integer variable can hold. However, when using the traditional matching
+function, PCRE uses recursion to handle subpatterns and indefinite repetition.
+This means that the available stack space may limit the size of a subject
+string that can be processed by certain patterns. For a discussion of stack
+issues, see the
+.\" HREF
+\fBpcrestack\fP
+.\"
+documentation.
+.
 .\" HTML <a name="utf8support"></a>
 .
 .
@@ -125,51 +167,84 @@
 .P
 If you compile PCRE with UTF-8 support, but do not use it at run time, the
 library will be a bit bigger, but the additional run time overhead is limited
-to testing the PCRE_UTF8 flag in several places, so should not be very large.
+to testing the PCRE_UTF8 flag occasionally, so should not be very big.
 .P
 If PCRE is built with Unicode character property support (which implies UTF-8
 support), the escape sequences \ep{..}, \eP{..}, and \eX are supported.
 The available properties that can be tested are limited to the general
 category properties such as Lu for an upper case letter or Nd for a decimal
-number. A full list is given in the
+number, the Unicode script names such as Arabic or Han, and the derived
+properties Any and L&. A full list is given in the
 .\" HREF
 \fBpcrepattern\fP
 .\"
-documentation. The PCRE library is increased in size by about 90K when Unicode
-property support is included.
-.P
-The following comments apply when PCRE is running in UTF-8 mode:
-.P
-1. When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
-are checked for validity on entry to the relevant functions. If an invalid
-UTF-8 string is passed, an error return is given. In some situations, you may
-already know that your strings are valid, and therefore want to skip these
-checks in order to improve performance. If you set the PCRE_NO_UTF8_CHECK flag
-at compile time or at run time, PCRE assumes that the pattern or subject it
-is given (respectively) contains only valid UTF-8 codes. In this case, it does
-not diagnose an invalid UTF-8 string. If you pass an invalid UTF-8 string to
-PCRE when PCRE_NO_UTF8_CHECK is set, the results are undefined. Your program
-may crash.
-.P
-2. In a pattern, the escape sequence \ex{...}, where the contents of the braces
-is a string of hexadecimal digits, is interpreted as a UTF-8 character whose
-code number is the given hexadecimal number, for example: \ex{1234}. If a
-non-hexadecimal digit appears between the braces, the item is not recognized.
-This escape sequence can be used either as a literal, or within a character
-class.
+documentation. Only the short names for properties are supported. For example,
+\ep{L} matches a letter. Its Perl synonym, \ep{Letter}, is not supported.
+Furthermore, in Perl, many properties may optionally be prefixed by "Is", for
+compatibility with Perl 5.6. PCRE does not support this.
+.
+.\" HTML <a name="utf8strings"></a>
+.
+.SS "Validity of UTF-8 strings"
+.rs
+.sp
+When you set the PCRE_UTF8 flag, the strings passed as patterns and subjects
+are (by default) checked for validity on entry to the relevant functions. From
+release 7.3 of PCRE, the check is according the rules of RFC 3629, which are
+themselves derived from the Unicode specification. Earlier releases of PCRE
+followed the rules of RFC 2279, which allows the full range of 31-bit values (0
+to 0x7FFFFFFF). The current check allows only values in the range U+0 to
+U+10FFFF, excluding U+D800 to U+DFFF.
+.P
+The excluded code points are the "Low Surrogate Area" of Unicode, of which the
+Unicode Standard says this: "The Low Surrogate Area does not contain any
+character assignments, consequently no character code charts or namelists are
+provided for this area. Surrogates are reserved for use with UTF-16 and then
+must be used in pairs." The code points that are encoded by UTF-16 pairs are
+available as independent code points in the UTF-8 encoding. (In other words,
+the whole surrogate thing is a fudge for UTF-16 which unfortunately messes up
+UTF-8.)
+.P
+If an invalid UTF-8 string is passed to PCRE, an error return
+(PCRE_ERROR_BADUTF8) is given. In some situations, you may already know that
+your strings are valid, and therefore want to skip these checks in order to
+improve performance. If you set the PCRE_NO_UTF8_CHECK flag at compile time or
+at run time, PCRE assumes that the pattern or subject it is given
+(respectively) contains only valid UTF-8 codes. In this case, it does not
+diagnose an invalid UTF-8 string.
+.P
+If you pass an invalid UTF-8 string when PCRE_NO_UTF8_CHECK is set, what
+happens depends on why the string is invalid. If the string conforms to the
+"old" definition of UTF-8 (RFC 2279), it is processed as a string of characters
+in the range 0 to 0x7FFFFFFF. In other words, apart from the initial validity
+test, PCRE (when in UTF-8 mode) handles strings according to the more liberal
+rules of RFC 2279. However, if the string does not even conform to RFC 2279,
+the result is undefined. Your program may crash.
+.P
+If you want to process strings of values in the full range 0 to 0x7FFFFFFF,
+encoded in a UTF-8-like manner as per the old RFC, you can set
+PCRE_NO_UTF8_CHECK to bypass the more restrictive test. However, in this
+situation, you will have to apply your own validity check.
+.
+.SS "General comments about UTF-8 mode"
+.rs
+.sp
+1. An unbraced hexadecimal escape sequence (such as \exb3) matches a two-byte
+UTF-8 character if the value is greater than 127.
 .P
-3. The original hexadecimal escape sequence, \exhh, matches a two-byte UTF-8
-character if the value is greater than 127.
+2. Octal numbers up to \e777 are recognized, and match two-byte UTF-8
+characters for values greater than \e177.
 .P
-4. Repeat quantifiers apply to complete UTF-8 characters, not to individual
+3. Repeat quantifiers apply to complete UTF-8 characters, not to individual
 bytes, for example: \ex{100}{3}.
 .P
-5. The dot metacharacter matches one UTF-8 character instead of a single byte.
+4. The dot metacharacter matches one UTF-8 character instead of a single byte.
 .P
-6. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
-but its use can lead to some strange effects.
+5. The escape sequence \eC can be used to match a single byte in UTF-8 mode,
+but its use can lead to some strange effects. This facility is not available in
+the alternative matching function, \fBpcre_dfa_exec()\fP.
 .P
-7. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
+6. The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly
 test characters of any code value, but the characters that PCRE recognizes as
 digits, spaces, or word characters remain the same set as before, all with
 values less than 256. This remains true even when PCRE includes Unicode
@@ -177,28 +252,41 @@
 cases. If you really want to test for a wider sense of, say, "digit", you
 must use Unicode property tests such as \ep{Nd}.
 .P
-8. Similarly, characters that match the POSIX named character classes are all
+7. Similarly, characters that match the POSIX named character classes are all
 low-valued characters.
 .P
+8. However, the Perl 5.10 horizontal and vertical whitespace matching escapes
+(\eh, \eH, \ev, and \eV) do match all the appropriate Unicode characters.
+.P
 9. Case-insensitive matching applies only to characters whose values are less
 than 128, unless PCRE is built with Unicode property support. Even when Unicode
 property support is available, PCRE still uses its own character tables when
 checking the case of low-valued characters, so as not to degrade performance.
 The Unicode property information is used only for characters with higher
-values.
+values. Even when Unicode property support is available, PCRE supports
+case-insensitive matching only when there is a one-to-one mapping between a
+letter's cases. There are a small number of many-to-one mappings in Unicode;
+these are not supported by PCRE.
+.
 .
 .SH AUTHOR
 .rs
 .sp
-Philip Hazel <ph10@cam.ac.uk>
-.br
-University Computing Service,
-.br
-Cambridge CB2 3QG, England.
-.br
-Phone: +44 1223 334714
-.sp
-.in 0
-Last updated: 09 September 2004
-.br
-Copyright (c) 1997-2004 University of Cambridge.
+.nf
+Philip Hazel
+University Computing Service
+Cambridge CB2 3QH, England.
+.fi
+.P
+Putting an actual email address here seems to have been a spam magnet, so I've
+taken it away. If you want to email me, use my two initials, followed by the
+two digits 10, at the domain cam.ac.uk.
+.
+.
+.SH REVISION
+.rs
+.sp
+.nf
+Last updated: 09 August 2007
+Copyright (c) 1997-2007 University of Cambridge.
+.fi



Mime
View raw message