apr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Amar <jal...@amar.com>
Subject Re: PCRE modules in APR?
Date Tue, 30 Dec 2008 18:07:48 GMT
Thanks for the reply.

Wes Garland wrote:
> 1. There is no APR equivalent for free, as it is neither needed nor 
> desired.   Simply allocate your memory from a pool, and destroy the 
> pool when it is no longer needed.  I would suggest making a subpool on 
> RE create and bury it in an opaque pointer describing your RE, if 
> you're actually going to go whole-hog on this. Me?   I use the OS 
> regexec/regcomp  (search only) and register an apr_pool_cleanup 
> handler to avoid leaking memory.
I'm creating a series of pre-compiled/analyzed regex expressions at 
server start up - and doing a lot of S&R during processing. I do create 
a dedicated pool for this, however, I can never destroy it, the 
pre-compiled expression are stored there and should stay there till 
server shutdown. And the PCRE documentation states that I should use one 
memory allocation function  before first usage. I will try to use one 
pool for the regex creations, and another to be used for the search part 
- see if that works.
>
> 2. Personally, I would never roll my own search and replace except 
> under exceptional circumstances. That said, your approach doesn't 
> sound unreasonable, but it's difficult to say what your problem is 
> without profiling the code and looking at memory consumption. Start by 
> consulting the literature, S&R is a well-understood problem; and maybe 
> google some stuff on ropes, they may serve you better than strings.
>
For those interested, I traced the issue to UTF-8 handling- PCRE_UTF8 
flag will significantly slow down the searches. Not all my regexes need 
to have UTF-8 enabled, only those dealing with embedded strings, so I 
shaved a lot of time off by being more selective.

> Here's a paper on ropes which discusses concatenation, which *should* 
> be where you're spending your search and replace time: 
> www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf 
> <http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/vol25/issue12/spe986.pdf>

Will read thanks!  But with UTF-8 out of the way,
output = apr_array_pstrcat ( subpool, strip_arr, 0 );
works perfectly fine and fast.
>
> Note - if your S&R is regexp instead of strcmp, you could also be 
> spending most of your time in the regex state machine. Profile!
>
> Wes
correct!

I guess I now have to deal with my UTF-8 issues.. ugh. I wonder if 
UTF-16 would be faster as all chars are 2 bytes long. I'll also try 
memcached to cache the results so I don't have to do the same processing 
on every request.

Thanks again

Jacques

Mime
View raw message