lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <>
Subject Re: Can this regex be done?
Date Thu, 03 Sep 2009 22:05:41 GMT
just a side note, LUCENE-1606 is intended to address exactly the
performance issue that you described.

rather than depending upon constant prefix or enumerating terms, it
can efficiently skip through the term dictionary.

the downside is that this behavior depends upon the ability to compile
a regular expression into a DFA.
some regex features (such as capturing groups/backreferences) need NFA
representation, though it does not support those.

On Thu, Sep 3, 2009 at 5:55 PM, Chris Hostetter<> wrote:
> : Because some of the queries that I have to convert (without modifying
> : them, unfortunately) have a half literally a page of statements
> : expressed like that that, if expanded, would equal a several page long
> : lucene query.
> FWIW: the RegexQuery (in contrib) applies the regex input to every term in
> the field (in some cases it can skip ahead if there it can find a
> constant prefix) to see if it matches, and then essentially builds a giant
> "OR" query of all the terms that match.
> so if your concern is that you don't want to translate the regex
> expressions yourself, the RegexQuerry can handle it for you ... but if
> your concern is that the equivilent query will be really big, well... it's
> going to be really big anyway, but if you convert it you might be able to
> make some optimizations during your conversion process (ie: if you know
> you only need to worry about groupings and alternates, and that you'll
> never have any wildcards (like your one example) then you can build up all
> of the possible clauses without doing a full TermEnum walk ... even for
> 100s of clauses, that's probably faster if your field has thousands of
> terms.
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Robert Muir

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message