lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@cs.put.poznan.pl>
Subject Re: PatternReplaceCharFilter, LUCENE-3820
Date Mon, 27 Feb 2012 07:55:04 GMT
Hi Koji,

> As I thought that buffering entire String tends to take place OOM when doing
> pattern matching,

This is a possibility of course if you have super-long documents on
input (buffering the input strings will be a problem, pattern matching
itself shouldn't be). Was it precautionary measure or did you really
hit OOMs? It would be possible to introduce those block delimiters of
course but I still think it doesn't make that much sense from a
practical point of view -- the code is simpler, more effective and
doesn't crash without it; if somebody parses super long inputs then
I'm sure this won't be the only source of the problem.

I will commit this in. Now that this is covered by Robert's randomized
pattern tests (which I'm sure will make all regexp implementations
very excited indeed) we can go back to it once there's realistic
feedback no boundaries cause a problem.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message