commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Ring <...@jdns.org>
Subject Re: Anyone interested in regular expressions, again?
Date Mon, 02 Feb 2015 22:20:35 GMT
I spoke to one of the authors of re2j, a Google-internal port of the C++
re2 library. The intention was to open source it but they just haven't got
around to it.

I may try and get Google to put re2j up on GitHub so you all can take a
look. AFAIK it is heavily used in Google and it has an API that is largely
compatible with java.util.regex. I know from personal experience that one
can often benefit from re2j merely by replacing java.util.regex imports
with the corresponding re2j imports.

Regards,
James
On Feb 1, 2015 11:44 PM, "Thomas Neidhart" <thomas.neidhart@gmail.com>
wrote:

> On 02/02/2015 03:25 AM, sebb wrote:
> > I would not wish to move away from Java RE *unless* the RE syntax was
> > the same *and* the implementation was better performing *and* the
> > existing code suffered from poor performance.
> >
> > It might be OK if the alternate implementation was missing some
> > esoteric features, but I would be very wary of using any features that
> > were not in the Java implementation.
> >
> > The likelihood is that the Java implementation will (eventually)
> > become more performant, at which point it would be useful to be able
> > to revert to the Java version.
> > That requires a high degree of compatibilty to reduce the work involved.
> >
> > It might be more useful to produce a tool that detects inefficient RE
> > usage and suggests improvements.
>
> I just know re2 a bit, but it is a trade-off:
>
>  * linear-time evaluation vs. some features (e.g. backreferences)
>
> A comparison between different regular expression implementations can be
> found here:
>
> http://en.wikipedia.org/wiki/Comparison_of_regular_expression_engines
>
> I am pretty sure the regexp implementation in java will not change,
> simply because of backwards compatibility reasons, but such a library
> would be useful as in many cases you do not need these additional
> features but want to ensure that your regular expression will be
> evaluated in linear time.
>
> Thomas
>
> >
> >
> > On 1 February 2015 at 22:35, James Carman <james@carmanconsulting.com>
> wrote:
> >> To be clear, I am not advocating this approach.  I was merely trying to
> >> illustrate what a nightmare such an endeavor would be. :)
> >>
> >> On Sunday, February 1, 2015, James Carman <james@carmanconsulting.com>
> >> wrote:
> >>
> >>> You would basically have to pick a canonical regex language if you
> want a
> >>> facade and be able to swap the regex library out.  Most of them are
> very
> >>> similar but they are not the same.
> >>>
> >>> On Sunday, February 1, 2015, Gary Gregory <garydgregory@gmail.com
> >>> <javascript:_e(%7B%7D,'cvml','garydgregory@gmail.com');>> wrote:
> >>>
> >>>> I think we'll need some clear performance advantages documented as
> well as
> >>>> any compatibility issues.
> >>>>
> >>>> This begs for a facade API IMO. I would not want to recode my app
> just to
> >>>> test one vs. the other, it should be pluggable.
> >>>>
> >>>> Gary
> >>>>
> >>>> On Sat, Jan 31, 2015 at 10:58 AM, Benson Margulies <
> bimargulies@gmail.com
> >>>>>
> >>>> wrote:
> >>>>
> >>>>> So, once upon a time, there was a regex library here. It was retired,
> >>>>> presumably on the grounds that it was rendered obsolete by the JRE's
> >>>>> native support.
> >>>>>
> >>>>> However, the JRE's regular expressions have a pretty severe problem;
> >>>>> they have unbounded (or at least, very, very, bad) execution time
for
> >>>>> some combinations of data and regex.
> >>>>>
> >>>>> To cope with this, we ported the Henry Spencer regular expression
> >>>>> library (as found in TCL) from C to Java.
> >>>>>
> >>>>> Thus: https://github.com/basis-technology-corp/tcl-regex-java
> >>>>>
> >>>>> Is anyone interested in this? Give or take the possible IP muddle
of
> >>>>> the original C Code, I could grant it easily.
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> E-Mail: garydgregory@gmail.com | ggregory@apache.org
> >>>> Java Persistence with Hibernate, Second Edition
> >>>> <http://www.manning.com/bauer3/>
> >>>> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> >>>> Spring Batch in Action <http://www.manning.com/templier/>
> >>>> Blog: http://garygregory.wordpress.com
> >>>> Home: http://garygregory.com/
> >>>> Tweet! http://twitter.com/GaryGregory
> >>>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message